|   | 
Details
   web
Records
Author Raul Gomez
Title Exploiting the Interplay between Visual and Textual Data for Scene Interpretation Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Machine learning experimentation under controlled scenarios and standard datasets is necessary to compare algorithms performance by evaluating all of them in the same setup. However, experimentation on how those algorithms perform on unconstrained data and applied tasks to solve real world problems is also a must to ascertain how that research can contribute to our society.
In this dissertation we experiment with the latest computer vision and natural language processing algorithms applying them to multimodal scene interpretation. Particularly, we research on how image and text understanding can be jointly exploited to address real world problems, focusing on learning from Social Media data.
We address several tasks that involve image and textual information, discuss their characteristics and offer our experimentation conclusions. First, we work on detection of scene text in images. Then, we work with Social Media posts, exploiting the captions associated to images as supervision to learn visual features, which we apply to multimodal semantic image retrieval. Subsequently, we work with geolocated Social Media images with associated tags, experimenting on how to use the tags as supervision, on location sensitive image retrieval and on exploiting location information for image tagging. Finally, we work on a specific classification problem of Social Media publications consisting on an image and a text: Multimodal hate speech classification.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Lluis Gomez;Jaume Gibert
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-7-1 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Gom20 Serial 3479
Permanent link to this record
 

 
Author Sounak Dey
Title Mapping between Images and Conceptual Spaces: Sketch-based Image Retrieval Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract This thesis presents several contributions to the literature of sketch based image retrieval (SBIR). In SBIR the first challenge we face is how to map two different domains to common space for effective retrieval of images, while tackling the different levels of abstraction people use to express their notion of objects around while sketching. To this extent we first propose a cross-modal learning framework that maps both sketches and text into a joint embedding space invariant to depictive style, while preserving semantics. Then we have also investigated different query types possible to encompass people's dilema in sketching certain world objects. For this we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set.

Finally, we explore the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognises two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended. We also in this dissertation pave the path to the future direction of research in this domain.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Umapada Pal
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-8-8 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Dey20 Serial 3480
Permanent link to this record
 

 
Author Marc Masana
Title Lifelong Learning of Neural Networks: Detecting Novelty and Adapting to New Domains without Forgetting Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Computer vision has gone through considerable changes in the last decade as neural networks have come into common use. As available computational capabilities have grown, neural networks have achieved breakthroughs in many computer vision tasks, and have even surpassed human performance in others. With accuracy being so high, focus has shifted to other issues and challenges. One research direction that saw a notable increase in interest is on lifelong learning systems. Such systems should be capable of efficiently performing tasks, identifying and learning new ones, and should moreover be able to deploy smaller versions of themselves which are experts on specific tasks. In this thesis, we contribute to research on lifelong learning and address the compression and adaptation of networks to small target domains, the incremental learning of networks faced with a variety of tasks, and finally the detection of out-of-distribution samples at inference time.

We explore how knowledge can be transferred from large pretrained models to more task-specific networks capable of running on smaller devices by extracting the most relevant information. Using a pretrained model provides more robust representations and a more stable initialization when learning a smaller task, which leads to higher performance and is known as domain adaptation. However, those models are too large for certain applications that need to be deployed on devices with limited memory and computational capacity. In this thesis we show that, after performing domain adaptation, some learned activations barely contribute to the predictions of the model. Therefore, we propose to apply network compression based on low-rank matrix decomposition using the activation statistics. This results in a significant reduction of the model size and the computational cost.

Like human intelligence, machine intelligence aims to have the ability to learn and remember knowledge. However, when a trained neural network is presented with learning a new task, it ends up forgetting previous ones. This is known as catastrophic forgetting and its avoidance is studied in continual learning. The work presented in this thesis extensively surveys continual learning techniques and presents an approach to avoid catastrophic forgetting in sequential task learning scenarios. Our technique is based on using ternary masks in order to update a network to new tasks, reusing the knowledge of previous ones while not forgetting anything about them. In contrast to earlier work, our masks are applied to the activations of each layer instead of the weights. This considerably reduces the number of parameters to be added for each new task. Furthermore, the analysis on a wide range of work on incremental learning without access to the task-ID, provides insight on current state-of-the-art approaches that focus on avoiding catastrophic forgetting by using regularization, rehearsal of previous tasks from a small memory, or compensating the task-recency bias.

Neural networks trained with a cross-entropy loss force the outputs of the model to tend toward a one-hot encoded vector. This leads to models being too overly confident when presented with images or classes that were not present in the training distribution. The capacity of a system to be aware of the boundaries of the learned tasks and identify anomalies or classes which have not been learned yet is key to lifelong learning and autonomous systems. In this thesis, we present a metric learning approach to out-of-distribution detection that learns the task at hand on an embedding space.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-9-5 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Mas20 Serial 3481
Permanent link to this record
 

 
Author Lei Kang
Title Robust Handwritten Text Recognition in Scarce Labeling Scenarios: Disentanglement, Adaptation and Generation Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Handwritten documents are not only preserved in historical archives but also widely used in administrative documents such as cheques and claims. With the rise of the deep learning era, many state-of-the-art approaches have achieved good performance on specific datasets for Handwritten Text Recognition (HTR). However, it is still challenging to solve real use cases because of the varied handwriting styles across different writers and the limited labeled data. Thus, both explorin a more robust handwriting recognition architectures and proposing methods to diminish the gap between the source and target data in an unsupervised way are
demanded.
In this thesis, firstly, we explore novel architectures for HTR, from Sequence-to-Sequence (Seq2Seq) method with attention mechanism to non-recurrent Transformer-based method. Secondly, we focus on diminishing the performance gap between source and target data in an unsupervised way. Finally, we propose a group of generative methods for handwritten text images, which could be utilized to increase the training set to obtain a more robust recognizer. In addition, by simply modifying the generative method and joining it with a recognizer, we end up with an effective disentanglement method to distill textual content from handwriting styles so as to achieve a generalized recognition performance.
We outperform state-of-the-art HTR performances in the experimental results among different scientific and industrial datasets, which prove the effectiveness of the proposed methods. To the best of our knowledge, the non-recurrent recognizer and the disentanglement method are the first contributions in the handwriting recognition field. Furthermore, we have outlined the potential research lines, which would be interesting to explore in the future.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Alicia Fornes;Marçal Rusiñol;Mauricio Villegas
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-0-9 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Kan20 Serial 3482
Permanent link to this record
 

 
Author Manuel Carbonell
Title Neural Information Extraction from Semi-structured Documents A Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Sectors as fintech, legaltech or insurance process an inflow of millions of forms, invoices, id documents, claims or similar every day. Together with these, historical archives provide gigantic amounts of digitized documents containing useful information that needs to be stored in machine encoded text with a meaningful structure. This procedure, known as information extraction (IE) comprises the steps of localizing and recognizing text, identifying named entities contained in it and optionally finding relationships among its elements. In this work we explore multi-task neural models at image and graph level to solve all steps in a unified way. While doing so we find benefits and limitations of these end-to-end approaches in comparison with sequential separate methods. More specifically, we first propose a method to produce textual as well as semantic labels with a unified model from handwritten text line images. We do so with the use of a convolutional recurrent neural model trained with connectionist temporal classification to predict the textual as well as semantic information encoded in the images. Secondly, motivated by the success of this approach we investigate the unification of the localization and recognition tasks of handwritten text in full pages with an end-to-end model, observing benefits in doing so. Having two models that tackle information extraction subsequent task pairs in an end-to-end to end manner, we lastly contribute with a method to put them all together in a single neural network to solve the whole information extraction pipeline in a unified way. Doing so we observe some benefits and some limitations in the approach, suggesting that in certain cases it is beneficial to train specialized models that excel at a single challenging task of the information extraction process, as it can be the recognition of named entities or the extraction of relationships between them. For this reason we lastly study the use of the recently arrived graph neural network architectures for the semantic tasks of the information extraction process, which are recognition of named entities and relation extraction, achieving promising results on the relation extraction part.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Alicia Fornes;Mauricio Villegas;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-1-6 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Car20 Serial 3483
Permanent link to this record
 

 
Author Gemma Rotger
Title Lifelike Humans: Detailed Reconstruction of Expressive Human Faces Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Developing human-like digital characters is a challenging task since humans are used to recognizing our fellows, and find the computed generated characters inadequately humanized. To fulfill the standards of the videogame and digital film productions it is necessary to model and animate these characters the most closely to human beings. However, it is an arduous and expensive task, since many artists and specialists are required to work on a single character. Therefore, to fulfill these requirements we found an interesting option to study the automatic creation of detailed characters through inexpensive setups. In this work, we develop novel techniques to bring detailed characters by combining different aspects that stand out when developing realistic characters, skin detail, facial hairs, expressions, and microexpressions. We examine each of the mentioned areas with the aim of automatically recover each of the parts without user interaction nor training data. We study the problems for their robustness but also for the simplicity of the setup, preferring single-image with uncontrolled illumination and methods that can be easily computed with the commodity of a standard laptop. A detailed face with wrinkles and skin details is vital to develop a realistic character. In this work, we introduce our method to automatically describe facial wrinkles on the image and transfer to the recovered base face. Then we advance to facial hair recovery by resolving a fitting problem with a novel parametrization model. As of last, we develop a mapping function that allows transfer expressions and microexpressions between different meshes, which provides realistic animations to our detailed mesh. We cover all the mentioned points with the focus on key aspects as (i) how to describe skin wrinkles in a simple and straightforward manner, (ii) how to recover 3D from 2D detections, (iii) how to recover and model facial hair from 2D to 3D, (iv) how to transfer expressions between models holding both skin detail and facial hair, (v) how to perform all the described actions without training data nor user interaction. In this work, we present our proposals to solve these aspects with an efficient and simple setup. We validate our work with several datasets both synthetic and real data, prooving remarkable results even in challenging cases as occlusions as glasses, thick beards, and indeed working with different face topologies like single-eyed cyclops.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Felipe Lumbreras;Antonio Agudo
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-3-0 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rot2021 Serial 3513
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz
Title Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications Type Book Whole
Year 2020 Publication Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications; VISIGRAPP 2020 Abbreviated Journal
Volume 4 Issue Pages (up)
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ FRB2020a Serial 3546
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz
Title Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications Type Book Whole
Year 2020 Publication Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications; VISIGRAPP 2020 Abbreviated Journal
Volume 5 Issue Pages (up)
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ FRB2020b Serial 3547
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part I Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12821 Issue Pages (up)
Keywords
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86548-1 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3725
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part II Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12822 Issue Pages (up)
Keywords
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86330-2 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3726
Permanent link to this record
 

 
Author Hassan Ahmed Sial
Title Estimating Light Effects from a Single Image: Deep Architectures and Ground-Truth Generation Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract In this thesis, we explore how to estimate the effects of the light interacting with the scene objects from a single image. To achieve this goal, we focus on recovering intrinsic components like reflectance, shading, or light properties such as color and position using deep architectures. The success of these approaches relies on training on large and diversified image datasets. Therefore, we present several contributions on this such as: (a) a data-augmentation technique; (b) a ground-truth for an existing multi-illuminant dataset; (c) a family of synthetic datasets, SID for Surreal Intrinsic Datasets, with diversified backgrounds and coherent light conditions; and (d) a practical pipeline to create hybrid ground-truths to overcome the complexity of acquiring realistic light conditions in a massive way. In parallel with the creation of datasets, we trained different flexible encoder-decoder deep architectures incorporating physical constraints from the image formation models.

In the last part of the thesis, we apply all the previous experience to two different problems. Firstly, we create a large hybrid Doc3DShade dataset with real shading and synthetic reflectance under complex illumination conditions, that is used to train a two-stage architecture that improves the character recognition task in complex lighting conditions of unwrapped documents. Secondly, we tackle the problem of single image scene relighting by extending both, the SID dataset to present stronger shading and shadows effects, and the deep architectures to use intrinsic components to estimate new relit images.
Address September 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Maria Vanrell;Ramon Baldrich
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-8-5 Medium
Area Expedition Conference
Notes CIC; Approved no
Call Number Admin @ si @ Sia2021 Serial 3607
Permanent link to this record
 

 
Author Fei Yang
Title Towards Practical Neural Image Compression Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Images and videos are pervasive in our life and communication. With advances in smart and portable devices, high capacity communication networks and high definition cinema, image and video compression are more relevant than ever. Traditional block-based linear transform codecs such as JPEG, H.264/AVC or the recent H.266/VVC are carefully designed to meet not only the rate-distortion criteria, but also the practical requirements of applications.
Recently, a new paradigm based on deep neural networks (i.e., neural image/video compression) has become increasingly popular due to its ability to learn powerful nonlinear transforms and other coding tools directly from data instead of being crafted by humans, as was usual in previous coding formats. While achieving excellent rate-distortion performance, these approaches are still limited mostly to research environments due to heavy models and other practical limitations, such as being limited to function on a particular rate and due to high memory and computational cost. In this thesis, we study these practical limitations, and designing more practical neural image compression approaches.
After analyzing the differences between traditional and neural image compression, our first contribution is the modulated autoencoder (MAE), a framework that includes a mechanism to provide multiple rate-distortion options within a single model with comparable performance to independent models. In a second contribution, we propose the slimmable compressive autoencoder (SlimCAE), which in addition to variable rate, can optimize the complexity of the model and thus reduce significantly the memory and computational burden.
Modern generative models can learn custom image transformation directly from suitable datasets following encoder-decoder architectures, task known as image-to-image (I2I) translation. Building on our previous work, we study the problem of distributed I2I translation, where the latent representation is transmitted through a binary channel and decoded in a remote receiving side. We also propose a variant that can perform both translation and the usual autoencoding functionality.
Finally, we also consider neural video compression, where the autoencoder is typically augmented with temporal prediction via motion compensation. One of the main bottlenecks of that framework is the optical flow module that estimates the displacement to predict the next frame. Focusing on this module, we propose a method that improves the accuracy of the optical flow estimation and a simplified variant that reduces the computational cost.
Key words: neural image compression, neural video compression, optical flow, practical neural image compression, compressive autoencoders, image-to-image translation, deep learning.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Luis Herranz;Mikhail Mozerov;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-7-8 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Yan2021 Serial 3608
Permanent link to this record
 

 
Author Javad Zolfaghari Bengar
Title Reducing Label Effort with Deep Active Learning Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition applications, such as image classification, detection and segmentation. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected
informative and/or representative samples. In this thesis we study several aspects of active learning including video object detection for autonomous driving systems, image classification on balanced and imbalanced datasets and the incorporation of self-supervised learning in active learning. We briefly describe our approach in each of these areas to reduce the labeling effort.
In chapter two we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our criterion is based on the estimated number of errors in terms of false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active
learning for video object detection in road scenes. Finally, we show that our
approach outperforms active learning baselines tested on two outdoor datasets.
In the next chapter we address the well-known problem of over confidence in the neural networks. As an alternative to network confidence, we propose a new informativeness-based active learning method that captures the learning dynamics of neural network with a metric called label-dispersion. This metric is low when the network consistently assigns the same label to the sample during the course of training and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
In chapter four, we tackle the problem of sampling bias in active learning methods on imbalanced datasets. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called longtail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we propose a general optimization framework that explicitly takes class-balancing into account. Results on three datasets show that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we show that also on balanced datasets our method generally results in a performance gain.
Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent advancements in self-training have achieved very impressive results rivaling supervised learning on some datasets. In the last chapter we focus on whether active learning and self supervised learning can benefit from each other.
We study object recognition datasets with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-9-2 Medium
Area Expedition Conference
Notes LAMP; Approved no
Call Number Admin @ si @ Zol2021 Serial 3609
Permanent link to this record
 

 
Author Edgar Riba
Title Geometric Computer Vision Techniques for Scene Reconstruction Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (up)
Keywords
Abstract From the early stages of Computer Vision, scene reconstruction has been one of the most studied topics leading to a wide variety of new discoveries and applications. Object grasping and manipulation, localization and mapping, or even visual effect generation are different examples of applications in which scene reconstruction has taken an important role for industries such as robotics, factory automation, or audio visual production. However, scene reconstruction is an extensive topic that can be approached in many different ways with already existing solutions that effectively work in controlled environments. Formally, the problem of scene reconstruction can be formulated as a sequence of independent processes which compose a pipeline. In this thesis, we analyse some parts of the reconstruction pipeline from which we contribute with novel methods using Convolutional Neural Networks (CNN) proposing innovative solutions that consider the optimisation of the methods in an end-to-end fashion. First, we review the state of the art of classical local features detectors and descriptors and contribute with two novel methods that inherently improve pre-existing solutions in the scene reconstruction pipeline.

It is a fact that computer science and software engineering are two fields that usually go hand in hand and evolve according to mutual needs making easier the design of complex and efficient algorithms. For this reason, we contribute with Kornia, a library specifically designed to work with classical computer vision techniques along with deep neural networks. In essence, we created a framework that eases the design of complex pipelines for computer vision algorithms so that can be included within neural networks and be used to backpropagate gradients throw a common optimisation framework. Finally, in the last chapter of this thesis we develop the aforementioned concept of designing end-to-end systems with classical projective geometry. Thus, we contribute with a solution to the problem of synthetic view generation by hallucinating novel views from high deformable cloths objects using a geometry aware end-to-end system. To summarize, in this thesis we demonstrate that with a proper design that combine classical geometric computer vision methods with deep learning techniques can lead to improve pre-existing solutions for the problem of scene reconstruction.
Address February 2021
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Daniel Ponsa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU Approved no
Call Number Admin @ si @ Rib2021 Serial 3610
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch
Title Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 4) Type Book Whole
Year 2021 Publication Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021 Abbreviated Journal
Volume 4 Issue Pages (up)
Keywords
Abstract This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes MILAB Approved no
Call Number Admin @ si @ FRB2021a Serial 3627
Permanent link to this record