|   | 
Details
   web
Records
Author Danna Xue; Fei Yang; Pei Wang; Luis Herranz; Jinqiu Sun; Yu Zhu; Yanning Zhang
Title SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision Type Conference Article
Year 2022 Publication 30th ACM International Conference on Multimedia Abbreviated Journal
Volume Issue Pages 6539-6548
Keywords
Abstract Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications. Recent works rely on well-crafted lightweight models to achieve fast inference. However, these models cannot flexibly adapt to varying accuracy and efficiency requirements. In this paper, we propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference depending on the desired accuracy-efficiency tradeoff. More specifically, we employ parametrized channel slimming by stepwise downward knowledge distillation during training. Motivated by the observation that the differences between segmentation results of each submodel are mainly near the semantic borders, we introduce an additional boundary guided semantic segmentation loss to further improve the performance of each submodel. We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance than independent models. Extensive experiments on semantic segmentation benchmarks, Cityscapes and CamVid, demonstrate the generalization ability of our framework.
Address Lisboa, Portugal, October 2022
Corporate Author Thesis
Publisher Association for Computing Machinery Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-1-4503-9203-7 Medium
Area Expedition Conference MM
Notes MACO; 600.161; 601.400 Approved no
Call Number Admin @ si @ XYW2022 Serial 3758
Permanent link to this record
 

 
Author Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title Cross-Spectral Image Processing Type Book Chapter
Year 2022 Publication Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision Abbreviated Journal
Volume Issue Pages 23-34
Keywords
Abstract Although this book is on IR computer vision and its main focus lies on IR image and video processing and analysis, a special attention is dedicated to cross-spectral image processing due to the increasing number of publications and applications in this domain. In these cross-spectral frameworks, IR information is used together with information from other spectral bands to tackle some specific problems by developing more robust solutions. Tasks considered for cross-spectral processing are for instance dehazing, segmentation, vegetation index estimation, or face recognition. This increasing number of applications is motivated by cross- and multi-spectral camera setups available already on the market like for example smartphones, remote sensing multispectral cameras, or multi-spectral cameras for automotive systems or drones. In this chapter, different cross-spectral image processing techniques will be reviewed together with possible applications. Initially, image registration approaches for the cross-spectral case are reviewed: the registration stage is the first image processing task, which is needed to align images acquired by different sensors within the same reference coordinate system. Then, recent cross-spectral image colorization approaches, which are intended to colorize infrared images for different applications are presented. Finally, the cross-spectral image enhancement problem is tackled by including guided super resolution techniques, image dehazing approaches, cross-spectral filtering and edge detection. Figure 3.1 illustrates cross-spectral image processing stages as well as their possible connections. Table 3.1 presents some of the available public cross-spectral datasets generally used as reference data to evaluate cross-spectral image registration, colorization, enhancement, or exploitation results.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-00698-2 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ TSH2022b Serial 3805
Permanent link to this record
 

 
Author Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title Detection, Classification, and Tracking Type Book Chapter
Year 2022 Publication Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision Abbreviated Journal
Volume Issue Pages 35-58
Keywords
Abstract Automatic image and video exploitation or content analysis is a technique to extract higher-level information from a scene such as objects, behavior, (inter-)actions, environment, or even weather conditions. The relevant information is assumed to be contained in the two-dimensional signal provided in an image (width and height in pixels) or the three-dimensional signal provided in a video (width, height, and time). But also intermediate-level information such as object classes [196], locations [197], or motion [198] can help applications to fulfill certain tasks such as intelligent compression [199], video summarization [200], or video retrieval [201]. Usually, videos with their temporal dimension are a richer source of data compared to single images [202] and thus certain video content can be extracted from videos only such as object motion or object behavior. Often, machine learning or nowadays deep learning techniques are utilized to model prior knowledge about object or scene appearance using labeled training samples [203, 204]. After a learning phase, these models are then applied in real world applications, which is called inference.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-00698-2 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ TSH2022c Serial 3806
Permanent link to this record
 

 
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title Human Body Pose Estimation in Multi-view Environments Type Book Chapter
Year 2022 Publication ICT Applications for Smart Cities. Intelligent Systems Reference Library Abbreviated Journal
Volume 224 Issue Pages 79-99
Keywords
Abstract This chapter tackles the challenging problem of human pose estimation in multi-view environments to handle scenes with self-occlusions. The proposed approach starts by first estimating the camera pose—extrinsic parameters—in multi-view scenarios; due to few real image datasets, different virtual scenes are generated by using a special simulator, for training and testing the proposed convolutional neural network based approaches. Then, these extrinsic parameters are used to establish the relation between different cameras into the multi-view scheme, which captures the pose of the person from different points of view at the same time. The proposed multi-view scheme allows to robustly estimate human body joints’ position even in situations where they are occluded. This would help to avoid possible false alarms in behavioral analysis systems of smart cities, as well as applications for physical therapy, safe moving assistance for the elderly among other. The chapter concludes by presenting experimental results in real scenes by using state-of-the-art and the proposed multi-view approaches.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-06306-0 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ CSV2022b Serial 3810
Permanent link to this record
 

 
Author Henry Velesaca; Patricia Suarez; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez; Angel Morera
Title Video Analytics in Urban Environments: Challenges and Approaches Type Book Chapter
Year 2022 Publication ICT Applications for Smart Cities Abbreviated Journal
Volume 224 Issue Pages 101-121
Keywords
Abstract This chapter reviews state-of-the-art approaches generally present in the pipeline of video analytics on urban scenarios. A typical pipeline is used to cluster approaches in the literature, including image preprocessing, object detection, object classification, and object tracking modules. Then, a review of recent approaches for each module is given. Additionally, applications and datasets generally used for training and evaluating the performance of these approaches are included. This chapter does not pretend to be an exhaustive review of state-of-the-art video analytics in urban environments but rather an illustration of some of the different recent contributions. The chapter concludes by presenting current trends in video analytics in the urban scenario field.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-06306-0 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ VSC2022 Serial 3811
Permanent link to this record
 

 
Author Angel Sappa (ed)
Title ICT Applications for Smart Cities Type Book Whole
Year 2022 Publication ICT Applications for Smart Cities Abbreviated Journal
Volume 224 Issue Pages
Keywords Computational Intelligence; Intelligent Systems; Smart Cities; ICT Applications; Machine Learning; Pattern Recognition; Computer Vision; Image Processing
Abstract Part of the book series: Intelligent Systems Reference Library (ISRL)

This book is the result of four-year work in the framework of the Ibero-American Research Network TICs4CI funded by the CYTED program. In the following decades, 85% of the world's population is expected to live in cities; hence, urban centers should be prepared to provide smart solutions for problems ranging from video surveillance and intelligent mobility to the solid waste recycling processes, just to mention a few. More specifically, the book describes underlying technologies and practical implementations of several successful case studies of ICTs developed in the following smart city areas:

• Urban environment monitoring
• Intelligent mobility
• Waste recycling processes
• Video surveillance
• Computer-aided diagnose in healthcare systems
• Computer vision-based approaches for efficiency in production processes

The book is intended for researchers and engineers in the field of ICTs for smart cities, as well as to anyone who wants to know about state-of-the-art approaches and challenges on this field.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-06306-0 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ Sap2022 Serial 3812
Permanent link to this record
 

 
Author Victoria Ruiz; Angel Sanchez; Jose F. Velez; Bogdan Raducanu
Title Waste Classification with Small Datasets and Limited Resources Type Book Chapter
Year 2022 Publication ICT Applications for Smart Cities. Intelligent Systems Reference Library Abbreviated Journal
Volume 224 Issue Pages 185-203
Keywords
Abstract Automatic waste recycling has become a very important societal challenge nowadays, raising people’s awareness for a cleaner environment and a more sustainable lifestyle. With the transition to Smart Cities, and thanks to advanced ICT solutions, this problem has received a new impulse. The waste recycling focus has shifted from general waste treating facilities to an individual responsibility, where each person should become aware of selective waste separation. The surge of the mobile devices, accompanied by a significant increase in computation power, has potentiated and facilitated this individual role. An automated image-based waste classification mechanism can help with a more efficient recycling and a reduction of contamination from residuals. Despite the good results achieved with the deep learning methodologies for this task, the Achille’s heel is that they require large neural networks which need significant computational resources for training and therefore are not suitable for mobile devices. To circumvent this apparently intractable problem, we will rely on knowledge distillation in order to transfer the network’s knowledge from a larger network (called ‘teacher’) to a smaller, more compact one, (referred as ‘student’) and thus making it possible the task of image classification on a device with limited resources. For evaluation, we considered as ‘teachers’ large architectures such as InceptionResNet or DenseNet and as ‘students’, several configurations of the MobileNets. We used the publicly available TrashNet dataset to demonstrate that the distillation process does not significantly affect system’s performance (e.g. classification accuracy) of the student network.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-06306-0 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Serial 3813
Permanent link to this record
 

 
Author Sergi Garcia Bordils; George Tom; Sangeeth Reddy; Minesh Mathew; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas
Title Read While You Drive-Multilingual Text Tracking on the Road Type Conference Article
Year 2022 Publication 15th IAPR International workshop on document analysis systems Abbreviated Journal
Volume 13237 Issue Pages 756–770
Keywords
Abstract Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.
Address La Rochelle; France; May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-06554-5 Medium
Area Expedition Conference DAS
Notes DAG; 600.155; 611.022; 611.004 Approved no
Call Number Admin @ si @ GTR2022 Serial 3783
Permanent link to this record
 

 
Author Utkarsh Porwal; Alicia Fornes; Faisal Shafait (eds)
Title Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022 Type Book Whole
Year 2022 Publication Frontiers in Handwriting Recognition. Abbreviated Journal
Volume 13639 Issue Pages
Keywords
Abstract
Address ICFHR 2022, Hyderabad, India, December 4–7, 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor Utkarsh Porwal; Alicia Fornes; Faisal Shafait
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-21648-0 Medium
Area Expedition Conference ICFHR
Notes DAG Approved no
Call Number Admin @ si @ PFS2022 Serial 3809
Permanent link to this record
 

 
Author Andrea Gemelli; Sanket Biswas; Enrico Civitelli; Josep Llados; Simone Marinai
Title Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks Type Conference Article
Year 2022 Publication 17th European Conference on Computer Vision Workshops Abbreviated Journal
Volume 13804 Issue Pages 329–344
Keywords
Abstract Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN (up) 978-3-031-25068-2 Medium
Area Expedition Conference ECCV-TiE
Notes DAG; 600.162; 600.140; 110.312 Approved no
Call Number Admin @ si @ GBC2022 Serial 3795
Permanent link to this record
 

 
Author Vacit Oguz Yazici
Title Towards Smart Fashion: Visual Recognition of Products and Attributes Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Artificial intelligence is innovating the fashion industry by proposing new applications and solutions to the problems encountered by researchers and engineers working in the industry. In this thesis, we address three of these problems. In the first part of the thesis, we tackle the problem of multi-label image classification which is very related to fashion attribute recognition. In the second part of the thesis, we address two problems that are specific to fashion. Firstly, we address the problem of main product detection which is the task of associating correct image parts (e.g. bounding boxes) with the fashion product being sold. Secondly, we address the problem of color naming for multicolored fashion items. The task of multi-label image classification consists in assigning various concepts such as objects or attributes to images. Usually, there are dependencies that can be learned between the concepts to capture label correlations (chair and table classes are more likely to co-exist than chair and giraffe).
If we treat the multi-label image classification problem as an orderless set prediction problem, we can exploit recurrent neural networks (RNN) to capture label correlations. However, RNNs are trained to predict ordered sequences of tokens, so if the order of the predicted sequence is different than the order of the ground truth sequence, there will be penalization although the predictions are correct. Therefore, in the first part of the thesis, we propose an orderless loss function which will order the labels in the ground truth sequence dynamically in a way that the minimum loss is achieved. This results in a significant improvement of RNN models on multi-label image classification over the previous methods.
However, RNNs suffer from long term dependencies when the cardinality of set grows bigger. The decoding process might stop early if the current hidden state cannot find any object and outputs the termination token. This would cause the remaining classes not to be predicted and lower recall metric. Transformers can be used to avoid the long term dependency problem exploiting their selfattention modules that process sequential data simultaneously. Consequently, we propose a novel transformer model for multi-label image classification which surpasses the state-of-the-art results by a large margin.
In the second part of thesis, we focus on two fashion-specific problems. Main product detection is the task of associating image parts with the fashion product that is being sold, generally using associated textual metadata (product title or description). Normally, in fashion e-commerces, products are represented by multiple images where a person wears the product along with other fashion items. If all the fashion items in the images are marked with bounding boxes, we can use the textual metadata to decide which item is the main product. The initial work treated each of these images independently, discarding the fact that they all belong to the same product. In this thesis, we represent the bounding boxes from all the images as nodes in a fully connected graph. This allows the algorithm to learn relations between the nodes during training and take the entire context into account for the final decision. Our algorithm results in a significant improvement of the state-ofthe-art.
Moreover, we address the problem of color naming for multicolored fashion items, which is a challenging task due to the external factors such as illumination changes or objects that act as clutter. In the context of multi-label classification, the vaguely defined lines between the classes in the color space cause ambiguity. For example, a shade of blue which is very close to green might cause the model to incorrectly predict the color blue and green at the same time. Based on this, models trained for color naming are expected to recognize the colors and their quantities in both single colored and multicolored fashion items. Therefore, in this thesis, we propose a novel architecture with an additional head that explicitly estimates the number of colors in fashion items. This removes the ambiguity problem and results in better color naming performance.
Address January 2022
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Arnau Ramisa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-84-122714-6-1 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Ogu2022 Serial 3631
Permanent link to this record
 

 
Author Akhil Gurram
Title Monocular Depth Estimation for Autonomous Driving Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract 3D geometric information is essential for on-board perception in autonomous driving and driver assistance. Autonomous vehicles (AVs) are equipped with calibrated sensor suites. As part of these suites, we can find LiDARs, which are expensive active sensors in charge of providing the 3D geometric information. Depending on the operational conditions for the AV, calibrated stereo rigs may be also sufficient for obtaining 3D geometric information, being these rigs less expensive and easier to install than LiDARs. However, ensuring a proper maintenance and calibration of these types of sensors is not trivial. Accordingly, there is an increasing interest on performing monocular depth estimation (MDE) to obtain 3D geometric information on-board. MDE is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Moreover, a set of single cameras with MDE capabilities would still be a cheap solution for on-board perception, relatively easy to integrate and maintain in an AV.
Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Accordingly, the overall goal of this PhD is to study methods for improving CNN-based MDE accuracy under different training settings. More specifically, this PhD addresses different research questions that are described below. When we started to work in this PhD, state-of-theart methods for MDE were already based on CNNs. In fact, a promising line of work consisted in using image-based semantic supervision (i.e., pixel-level class labels) while training CNNs for MDE using LiDAR-based supervision (i.e., depth). It was common practice to assume that the same raw training data are complemented by both types of supervision, i.e., with depth and semantic labels. However, in practice, it was more common to find heterogeneous datasets with either only depth supervision or only semantic supervision. Therefore, our first work was to research if we could train CNNs for MDE by leveraging depth and semantic information from heterogeneous datasets. We show that this is indeed possible, and we surpassed the state-of-the-art results on MDE at the time we did this research. To achieve our results, we proposed a particular CNN architecture and a new training protocol.
After this research, it was clear that the upper-bound setting to train CNN-based MDE models consists in using LiDAR data as supervision. However, it would be cheaper and more scalable if we would be able to train such models from monocular sequences. Obviously, this is far more challenging, but worth to research. Training MDE models using monocular sequences is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. To alleviate these problems, we perform MDE by virtual-world supervision and real-world SfM self-supervision. We call our proposalMonoDEVSNet. We compensate the SfM self-supervision limitations by leveraging
virtual-world images with accurate semantic and depth supervision, as well as addressing the virtual-to-real domain gap. MonoDEVSNet outperformed previous MDE CNNs trained on monocular and even stereo sequences. We have publicly released MonoDEVSNet at <https://github.com/HMRC-AEL/MonoDEVSNet>.
Finally, since MDE is performed to produce 3D information for being used in downstream tasks related to on-board perception. We also address the question of whether the standard metrics for MDE assessment are a good indicator for future MDE-based driving-related perception tasks. By using 3D object detection on point clouds as proxy of on-board perception, we conclude that, indeed, MDE evaluation metrics give rise to a ranking of methods which reflects relatively well the 3D object detection results we may expect.
Address March, 2022
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Antonio Lopez;Onay Urfalioglu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-84-124793-0-0 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Gur2022 Serial 3712
Permanent link to this record
 

 
Author Parichehr Behjati Ardakani
Title Towards Efficient and Robust Convolutional Neural Networks for Single Image Super-Resolution Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Single image super-resolution (SISR) is an important task in image processing which aims to enhance the resolution of imaging systems. Recently, SISR has witnessed great strides with the rapid development of deep learning. Recent advances in SISR are mostly devoted to designing deeper and wider networks to enhance their representation learning capacity. However, as the depth of networks increases, deep learning-based methods are faced with the challenge of computational complexity in practice. Moreover, most existing methods rarely leverage the intermediate features and also do not discriminate the computation of features by their frequencial components, thereby achieving relatively low performance. Aside from the aforementioned problems, another desired ability is to upsample images to arbitrary scales using a single model. Most current SISR methods train a dedicated model for each target resolution, losing generality and increasing memory requirements. In this thesis, we address the aforementioned issues and propose solutions to them: i) We present a novel frequency-based enhancement block which treats different frequencies in a heterogeneous way and also models inter-channel dependencies, which consequently enrich the output feature. Thus it helps the network generate more discriminative representations by explicitly recovering finer details. ii) We introduce OverNet which contains two main parts: a lightweight feature extractor that follows a novel recursive framework of skip and dense connections to reduce low-level feature degradation, and an overscaling module that generates an accurate SR image by internally constructing an overscaled intermediate representation of the output features. Then, to solve the problem of reconstruction at arbitrary scale factors, we introduce a novel multi-scale loss, that allows the simultaneous training of all scale factors using a single model. iii) We propose a directional variance attention network which leverages a novel attention mechanism to enhance features in different channels and spatial regions. Moreover, we introduce a novel procedure for using attention mechanisms together with residual blocks to facilitate the preservation of finer details. Finally, we demonstrate that our approaches achieve considerably better performance than previous state-of-the-art methods, in terms of both quantitative and visual quality.
Address April, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Jordi Gonzalez;Xavier Roca;Pau Rodriguez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-84-124793-1-7 Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ Beh2022 Serial 3713
Permanent link to this record
 

 
Author Kai Wang
Title Continual learning for hierarchical classification, few-shot recognition, and multi-modal learning Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep learning has drastically changed computer vision in the past decades and achieved great success in many applications, such as image classification, retrieval, detection, and segmentation thanks to the emergence of neural networks. Typically, for most applications, these networks are presented with examples from all tasks they are expected to perform. However, for many applications, this is not a realistic
scenario, and an algorithm is required to learn tasks sequentially. Continual learning proposes theory and methods for this scenario.
The main challenge for continual learning systems is called catastrophic forgetting and refers to a significant drop in performance on previous tasks. To tackle this problem, three main branches of methods have been explored to alleviate the forgetting in continual learning. They are regularization-based methods, rehearsalbased methods, and parameter isolation-based methods. However, most of them are focused on image classification tasks. Continual learning of many computer vision fields has still not been well-explored. Thus, in this thesis, we extend the continual learning knowledge to meta learning, we propose a method for the incremental learning of hierarchical relations for image classification, we explore image recognition in online continual learning, and study continual learning for cross-modal learning.
In this thesis, we explore the usage of image rehearsal when addressing the incremental meta learning problem. Observing that existingmethods fail to improve performance with saved exemplars, we propose to mix exemplars with current task data and episode-level distillation to overcome forgetting in incremental meta learning. Next, we study a more realistic image classification scenario where each class has multiple granularity levels. Only one label is present at any time, which requires the model to infer if the provided label has a hierarchical relation with any already known label. In experiments, we show that the estimated hierarchy information can be beneficial in both the training and inference stage.
For the online continual learning setting, we investigate the usage of intermediate feature replay. In this case, the training samples are only observed by the model only one time. Here we fix thememory buffer for feature replay and compare the effectiveness of saving features from different layers. Finally, we investigate multi-modal continual learning, where an image encoder is cooperating with a semantic branch. We consider the continual learning of both zero-shot learning and cross-modal retrieval problems.
Address July, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Luis Herranz;Joost Van de Weijer
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-84-124793-2-4 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Wan2022 Serial 3714
Permanent link to this record
 

 
Author Idoia Ruiz
Title Deep Metric Learning for re-identification, tracking and hierarchical novelty detection Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Metric learning refers to the problem in machine learning of learning a distance or similarity measurement to compare data. In particular, deep metric learning involves learning a representation, also referred to as embedding, such that in the embedding space data samples can be compared based on the distance, directly providing a similarity measure. This step is necessary to perform several tasks in computer vision. It allows to perform the classification of images, regions or pixels, re-identification, out-of-distribution detection, object tracking in image sequences and any other task that requires computing a similarity score for their solution. This thesis addresses three specific problems that share this common requirement. The first one is person re-identification. Essentially, it is an image retrieval task that aims at finding instances of the same person according to a similarity measure. We first compare in terms of accuracy and efficiency, classical metric learning to basic deep learning based methods for this problem. In this context, we also study network distillation as a strategy to optimize the trade-off between accuracy and speed at inference time. The second problem we contribute to is novelty detection in image classification. It consists in detecting samples of novel classes, i.e. never seen during training. However, standard novelty detection does not provide any information about the novel samples besides they are unknown. Aiming at more informative outputs, we take advantage from the hierarchical taxonomies that are intrinsic to the classes. We propose a metric learning based approach that leverages the hierarchical relationships among classes during training, being able to predict the parent class for a novel sample in such hierarchical taxonomy. Our third contribution is in multi-object tracking and segmentation. This joint task comprises classification, detection, instance segmentation and tracking. Tracking can be formulated as a retrieval problem to be addressed with metric learning approaches. We tackle the existing difficulty in academic research that is the lack of annotated benchmarks for this task. To this matter, we introduce the problem of weakly supervised multi-object tracking and segmentation, facing the challenge of not having available ground truth for instance segmentation. We propose a synergistic training strategy that benefits from the knowledge of the supervised tasks that are being learnt simultaneously.
Address July, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Joan Serrat
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN (up) 978-84-124793-4-8 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rui2022 Serial 3717
Permanent link to this record