Records |
Author |
Santi Puch; Irina Sanchez; Aura Hernandez-Sabate; Gemma Piella; Vesna Prckovska |
Title |
Global Planar Convolutions for Improved Context Aggregation in Brain Tumor Segmentation |
Type |
Conference Article |
Year |
2018 |
Publication |
International MICCAI Brainlesion Workshop |
Abbreviated Journal |
|
Volume |
11384 |
Issue |
|
Pages |
393-405 |
Keywords |
Brain tumors; 3D fully-convolutional CNN; Magnetic resonance imaging; Global planar convolution |
Abstract |
In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel architecture based on these two architectures, ContextNet, that includes the proposed Global Planar Convolution module. We show that the addition of such module eliminates the need of building networks with several representation levels, which tend to be over-parametrized and to showcase slow rates of convergence. Furthermore, we provide a visual demonstration of the behavior of GPC modules via visualization of intermediate representations. We finally participate in the 2018 edition of the BraTS challenge with our best performing models, that are based on ContextNet, and report the evaluation scores on the validation and the test sets of the challenge. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MICCAIW |
Notes |
ADAS; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ PSH2018 |
Serial |
3251 |
Permanent link to this record |
|
|
|
Author |
Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas |
Title |
Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods |
Type |
Conference Article |
Year |
2018 |
Publication |
15th European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
11134 |
Issue |
|
Pages |
530-544 |
Keywords |
|
Abstract |
Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis. |
Address |
Munich; Alemanya; September 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCVW |
Notes |
DAG; 600.129; 601.338; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ GGG2018b |
Serial |
3176 |
Permanent link to this record |
|
|
|
Author |
Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas |
Title |
Learning to Learn from Web Data through Deep Semantic Embeddings |
Type |
Conference Article |
Year |
2018 |
Publication |
15th European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
11134 |
Issue |
|
Pages |
514-529 |
Keywords |
|
Abstract |
In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings. |
Address |
Munich; Alemanya; September 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCVW |
Notes |
DAG; 600.129; 601.338; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ GGG2018a |
Serial |
3175 |
Permanent link to this record |
|
|
|
Author |
Esmitt Ramirez; Carles Sanchez; Agnes Borras; Marta Diez-Ferrer; Antoni Rosell; Debora Gil |
Title |
Image-Based Bronchial Anatomy Codification for Biopsy Guiding in Video Bronchoscopy |
Type |
Conference Article |
Year |
2018 |
Publication |
OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis |
Abbreviated Journal |
|
Volume |
11041 |
Issue |
|
Pages |
|
Keywords |
Biopsy guiding; Bronchoscopy; Lung biopsy; Intervention guiding; Airway codification |
Abstract |
Bronchoscopy examinations allow biopsy of pulmonary nodules with minimum risk for the patient. Even for experienced bronchoscopists, it is difficult to guide the bronchoscope to most distal lesions and obtain an accurate diagnosis. This paper presents an image-based codification of the bronchial anatomy for bronchoscopy biopsy guiding. The 3D anatomy of each patient is codified as a binary tree with nodes representing bronchial levels and edges labeled using their position on images projecting the 3D anatomy from a set of branching points. The paths from the root to leaves provide a codification of navigation routes with spatially consistent labels according to the anatomy observes in video bronchoscopy explorations. We evaluate our labeling approach as a guiding system in terms of the number of bronchial levels correctly codified, also in the number of labels-based instructions correctly supplied, using generalized mixed models and computer-generated data. Results obtained for three independent observers prove the consistency and reproducibility of our guiding system. We trust that our codification based on viewer’s projection might be used as a foundation for the navigation process in Virtual Bronchoscopy systems. |
Address |
Granada; September 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MICCAIW |
Notes |
IAM; 600.096; 600.075; 601.323; 600.145 |
Approved |
no |
Call Number |
Admin @ si @ RSB2018b |
Serial |
3137 |
Permanent link to this record |
|
|
|
Author |
Stefan Schurischuster; Beatriz Remeseiro; Petia Radeva; Martin Kampel |
Title |
A Preliminary Study of Image Analysis for Parasite Detection on Honey Bees |
Type |
Conference Article |
Year |
2018 |
Publication |
15th International Conference on Image Analysis and Recognition |
Abbreviated Journal |
|
Volume |
10882 |
Issue |
|
Pages |
465-473 |
Keywords |
|
Abstract |
Varroa destructor is a parasite harming bee colonies. As the worldwide bee population is in danger, beekeepers as well as researchers are looking for methods to monitor the health of bee hives. In this context, we present a preliminary study to detect parasites on bee videos by means of image analysis and machine learning techniques. For this purpose, each video frame is analyzed individually to extract bee image patches, which are then processed to compute image descriptors and finally classified into mite and no mite bees. The experimental results demonstrated the adequacy of the proposed method, which will be a perfect stepping stone for a further bee monitoring system. |
Address |
Povoa de Varzim; Portugal; June 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICIAR |
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ SRR2018a |
Serial |
3110 |
Permanent link to this record |
|
|
|
Author |
Joan Codina-Filba; Sergio Escalera; Joan Escudero; Coen Antens; Pau Buch-Cardona; Mireia Farrus |
Title |
Mobile eHealth Platform for Home Monitoring of Bipolar Disorder |
Type |
Conference Article |
Year |
2021 |
Publication |
27th ACM International Conference on Multimedia Modeling |
Abbreviated Journal |
|
Volume |
12573 |
Issue |
|
Pages |
330-341 |
Keywords |
|
Abstract |
People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.
MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MMM |
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ CEE2021 |
Serial |
3659 |
Permanent link to this record |
|
|
|
Author |
Estefania Talavera; Nicolai Petkov; Petia Radeva |
Title |
Unsupervised Routine Discovery in Egocentric Photo-Streams |
Type |
Conference Article |
Year |
2019 |
Publication |
18th International Conference on Computer Analysis of Images and Patterns |
Abbreviated Journal |
|
Volume |
11678 |
Issue |
|
Pages |
576-588 |
Keywords |
Routine discovery; Lifestyle; Egocentric vision; Behaviour analysis |
Abstract |
The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person’s health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people. |
Address |
Salermo; Italy; September 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CAIP |
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ TPR2019a |
Serial |
3367 |
Permanent link to this record |
|
|
|
Author |
Simone Balocco; Mauricio Gonzalez; Ricardo Ñancule; Petia Radeva; Gabriel Thomas |
Title |
Calcified Plaque Detection in IVUS Sequences: Preliminary Results Using Convolutional Nets |
Type |
Conference Article |
Year |
2018 |
Publication |
International Workshop on Artificial Intelligence and Pattern Recognition |
Abbreviated Journal |
|
Volume |
11047 |
Issue |
|
Pages |
34-42 |
Keywords |
Intravascular ultrasound images; Convolutional nets; Deep learning; Medical image analysis |
Abstract |
The manual inspection of intravascular ultrasound (IVUS) images to detect clinically relevant patterns is a difficult and laborious task performed routinely by physicians. In this paper, we present a framework based on convolutional nets for the quick selection of IVUS frames containing arterial calcification, a pattern whose detection plays a vital role in the diagnosis of atherosclerosis. Preliminary experiments on a dataset acquired from eighty patients show that convolutional architectures improve detections of a shallow classifier in terms of 𝐹1-measure, precision and recall. |
Address |
Cuba; September 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
IWAIPR |
Notes |
MILAB; no menciona |
Approved |
no |
Call Number |
Admin @ si @ BGÑ2018 |
Serial |
3237 |
Permanent link to this record |
|
|
|
Author |
Md. Mostafa Kamal Sarker; Hatem A. Rashwan; Farhan Akram; Syeda Furruka Banu; Adel Saleh; Vivek Kumar Singh; Forhad U. H. Chowdhury; Saddam Abdulwahab; Santiago Romani; Petia Radeva; Domenec Puig |
Title |
SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks. |
Type |
Conference Article |
Year |
2018 |
Publication |
21st International Conference on Medical Image Computing & Computer Assisted Intervention |
Abbreviated Journal |
|
Volume |
2 |
Issue |
|
Pages |
21-29 |
Keywords |
|
Abstract |
Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we investigated a loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the melanoma regions with sharp boundaries. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of segmentation accuracy. Moreover, it is capable to segment more than 100 images of size 384x384 per second on a recent GPU. |
Address |
Granada; Espanya; September 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MICCAI |
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ SRA2018 |
Serial |
3112 |
Permanent link to this record |
|
|
|
Author |
Asma Bensalah; Jialuo Chen; Alicia Fornes; Cristina Carmona_Duarte; Josep Llados; Miguel A. Ferrer |
Title |
Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. |
Type |
Conference Article |
Year |
2020 |
Publication |
International Workshop on Artificial Intelligence for Healthcare Applications |
Abbreviated Journal |
|
Volume |
12661 |
Issue |
|
Pages |
476-489 |
Keywords |
|
Abstract |
Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field. |
Address |
Virtual; January 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICPRW |
Notes |
DAG; 600.121; 600.140; |
Approved |
no |
Call Number |
Admin @ si @ BCF2020 |
Serial |
3508 |
Permanent link to this record |
|
|
|
Author |
Vacit Oguz Yazici; Joost Van de Weijer; Longlong Yu |
Title |
Visual Transformers with Primal Object Queries for Multi-Label Image Classification |
Type |
Conference Article |
Year |
2022 |
Publication |
26th International Conference on Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1% and 1.8%; and speeds up the convergence by 79.0% and 38.6% on MS-COCO and NUS-WIDE datasets respectively. |
Address |
Montreal; Quebec; Canada; August 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICPR |
Notes |
LAMP; 600.147; 601.309 |
Approved |
no |
Call Number |
Admin @ si @ YWY2022 |
Serial |
3786 |
Permanent link to this record |
|
|
|
Author |
Ayan Banerjee; Palaiahnakote Shivakumara; Parikshit Acharya; Umapada Pal; Josep Llados |
Title |
TWD: A New Deep E2E Model for Text Watermark Detection in Video Images |
Type |
Conference Article |
Year |
2022 |
Publication |
26th International Conference on Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
Deep learning; U-Net; FCENet; Scene text detection; Video text detection; Watermark text detection |
Abstract |
Text watermark detection in video images is challenging because text watermark characteristics are different from caption and scene texts in the video images. Developing a successful model for detecting text watermark, caption, and scene texts is an open challenge. This study aims at developing a new Deep End-to-End model for Text Watermark Detection (TWD), caption and scene text in video images. To standardize non-uniform contrast, quality, and resolution, we explore the U-Net3+ model for enhancing poor quality text without affecting high-quality text. Similarly, to address the challenges of arbitrary orientation, text shapes and complex background, we explore Stacked Hourglass Encoded Fourier Contour Embedding Network (SFCENet) by feeding the output of the U-Net3+ model as input. Furthermore, the proposed work integrates enhancement and detection models as an end-to-end model for detecting multi-type text in video images. To validate the proposed model, we create our own dataset (named TW-866), which provides video images containing text watermark, caption (subtitles), as well as scene text. The proposed model is also evaluated on standard natural scene text detection datasets, namely, ICDAR 2019 MLT, CTW1500, Total-Text, and DAST1500. The results show that the proposed method outperforms the existing methods. This is the first work on text watermark detection in video images to the best of our knowledge |
Address |
Montreal; Quebec; Canada; August 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICPR |
Notes |
DAG; |
Approved |
no |
Call Number |
Admin @ si @ BSA2022 |
Serial |
3788 |
Permanent link to this record |
|
|
|
Author |
Aitor Alvarez-Gila; Joost Van de Weijer; Yaxing Wang; Estibaliz Garrote |
Title |
MVMO: A Multi-Object Dataset for Wide Baseline Multi-View Semantic Segmentation |
Type |
Conference Article |
Year |
2022 |
Publication |
29th IEEE International Conference on Image Processing |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
multi-view; cross-view; semantic segmentation; synthetic dataset |
Abstract |
We present MVMO (Multi-View, Multi-Object dataset): a synthetic dataset of 116,000 scenes containing randomly placed objects of 10 distinct classes and captured from 25 camera locations in the upper hemisphere. MVMO comprises photorealistic, path-traced image renders, together with semantic segmentation ground truth for every view. Unlike existing multi-view datasets, MVMO features wide baselines between cameras and high density of objects, which lead to large disparities, heavy occlusions and view-dependent object appearance. Single view semantic segmentation is hindered by self and inter-object occlusions that could benefit from additional viewpoints. Therefore, we expect that MVMO will propel research in multi-view semantic segmentation and cross-view semantic transfer. We also provide baselines that show that new research is needed in such fields to exploit the complementary information of multi-view setups 1 . |
Address |
Bordeaux; France; October2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICIP |
Notes |
LAMP |
Approved |
no |
Call Number |
Admin @ si @ AWW2022 |
Serial |
3781 |
Permanent link to this record |
|
|
|
Author |
Guillem Martinez; Maya Aghaei; Martin Dijkstra; Bhalaji Nagarajan; Femke Jaarsma; Jaap van de Loosdrecht; Petia Radeva; Klaas Dijkstra |
Title |
Hyper-Spectral Imaging for Overlapping Plastic Flakes Segmentation |
Type |
Conference Article |
Year |
2022 |
Publication |
47th International Conference on Acoustics, Speech, and Signal Processing |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
Hyper-spectral imaging; plastic sorting; multi-label segmentation; bitfield encoding |
Abstract |
In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms. |
Address |
Singapore; May 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICASSP |
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ MAD2022 |
Serial |
3767 |
Permanent link to this record |
|
|
|
Author |
Chengyi Zou; Shuai Wan; Marta Mrak; Marc Gorriz Blanch; Luis Herranz; Tiannan Ji |
Title |
Towards Lightweight Neural Network-based Chroma Intra Prediction for Video Coding |
Type |
Conference Article |
Year |
2022 |
Publication |
29th IEEE International Conference on Image Processing |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
Video coding; Quantization (signal); Computational modeling; Neural networks; Predictive models; Video compression; Syntactics |
Abstract |
In video compression the luma channel can be useful for predicting chroma channels (Cb, Cr), as has been demonstrated with the Cross-Component Linear Model (CCLM) used in Versatile Video Coding (VVC) standard. More recently, it has been shown that neural networks can even better capture the relationship among different channels. In this paper, a new attention-based neural network is proposed for cross-component intra prediction. With the goal to simplify neural network design, the new framework consists of four branches: boundary branch and luma branch for extracting features from reference samples, attention branch for fusing the first two branches, and prediction branch for computing the predicted chroma samples. The proposed scheme is integrated into VVC test model together with one additional binary block-level syntax flag which indicates whether a given block makes use of the proposed method. Experimental results demonstrate 0.31%/2.36%/2.00% BD-rate reductions on Y/Cb/Cr components, respectively, on top of the VVC Test Model (VTM) 7.0 which uses CCLM. |
Address |
Bordeaux; France; October 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICIP |
Notes |
MACO |
Approved |
no |
Call Number |
Admin @ si @ ZWM2022 |
Serial |
3790 |
Permanent link to this record |