Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 >>

Details

	Records
	Author	Idoia Ruiz
	Title	Deep Metric Learning for re-identification, tracking and hierarchical novelty detection			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Metric learning refers to the problem in machine learning of learning a distance or similarity measurement to compare data. In particular, deep metric learning involves learning a representation, also referred to as embedding, such that in the embedding space data samples can be compared based on the distance, directly providing a similarity measure. This step is necessary to perform several tasks in computer vision. It allows to perform the classification of images, regions or pixels, re-identification, out-of-distribution detection, object tracking in image sequences and any other task that requires computing a similarity score for their solution. This thesis addresses three specific problems that share this common requirement. The first one is person re-identification. Essentially, it is an image retrieval task that aims at finding instances of the same person according to a similarity measure. We first compare in terms of accuracy and efficiency, classical metric learning to basic deep learning based methods for this problem. In this context, we also study network distillation as a strategy to optimize the trade-off between accuracy and speed at inference time. The second problem we contribute to is novelty detection in image classification. It consists in detecting samples of novel classes, i.e. never seen during training. However, standard novelty detection does not provide any information about the novel samples besides they are unknown. Aiming at more informative outputs, we take advantage from the hierarchical taxonomies that are intrinsic to the classes. We propose a metric learning based approach that leverages the hierarchical relationships among classes during training, being able to predict the parent class for a novel sample in such hierarchical taxonomy. Our third contribution is in multi-object tracking and segmentation. This joint task comprises classification, detection, instance segmentation and tracking. Tracking can be formulated as a retrieval problem to be addressed with metric learning approaches. We tackle the existing difficulty in academic research that is the lack of annotated benchmarks for this task. To this matter, we introduce the problem of weakly supervised multi-object tracking and segmentation, facing the challenge of not having available ground truth for instance segmentation. We propose a synergistic training strategy that benefits from the knowledge of the supervised tasks that are being learnt simultaneously.
	Address	July, 2022
	Corporate Author				Thesis	Ph.D. thesis
	Publisher		Place of Publication		Editor	Joan Serrat
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-4-8	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Rui2022			Serial	3717
Permanent link to this record



	Author	Jose Luis Gomez Zurita
	Title	Synth-to-real semi-supervised learning for visual tasks			Type	Book Whole
	Year	2023	Publication	Going beyond Classification Problems for the Continual Learning of Deep Neural Networks	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The curse of data labeling is a costly bottleneck in supervised deep learning, where large amounts of labeled data are needed to train intelligent systems. In onboard perception for autonomous driving, this cost corresponds to the labeling of raw data from sensors such as cameras, LiDARs, RADARs, etc. Therefore, synthetic data with automatically generated ground truth (labels) has aroused as a reliable alternative for training onboard perception models. However, synthetic data commonly suffers from synth-to-real domain shift, i.e., models trained on the synthetic domain do not show their achievable accuracy when performing in the real world. This shift needs to be addressed by techniques falling in the realm of domain adaptation (DA). The semi-supervised learning (SSL) paradigm can be followed to address DA. In this case, a model is trained using source data with labels (here synthetic) and leverages minimal knowledge from target data (here the real world) to generate pseudo-labels. These pseudo-labels help the training process to reduce the gap between the source and the target domains. In general, we can assume accessing both, pseudo-labels and a few amounts of human-provided labels for the target-domain data. However, the most interesting and challenging setting consists in assuming that we do not have human-provided labels at all. This setting is known as unsupervised domain adaptation (UDA). This PhD focuses on applying SSL to the UDA setting, for onboard visual tasks related to autonomous driving. We start by addressing the synth-to-real UDA problem on onboard vision-based object detection (pedestrians and cars), a critical task for autonomous driving and driving assistance. In particular, we propose to apply an SSL technique known as co-training, which we adapt to work with deep models that process a multi-modal input. The multi-modality consists of the visual appearance of the images (RGB) and their monocular depth estimation. The synthetic data we use as the source domain contains both, object bounding boxes and depth information. This prior knowledge is the starting point for the co-training technique, which iteratively labels unlabeled real-world data and uses such pseudolabels (here bounding boxes with an assigned object class) to progressively improve the labeling results. Along this process, two models collaborate to automatically label the images, in a way that one model compensates for the errors of the other, so avoiding error drift. While this automatic labeling process is done offline, the resulting pseudolabels can be used to train object detection models that must perform in real-time onboard a vehicle. We show that multi-modal co-training improves the labeling results compared to single-modal co-training, remaining competitive compared to human labeling. Given the success of co-training in the context of object detection, we have also adapted this technique to a more crucial and challenging visual task, namely, onboard semantic segmentation. In fact, providing labels for a single image can take from 30 to 90 minutes for a human labeler, depending on the content of the image. Thus, developing automatic labeling techniques for this visual task is of great interest to the automotive industry. In particular, the new co-training framework addresses synth-to-real UDA by an initial stage of self-training. Intermediate models arising from this stage are used to start the co-training procedure, for which we have elaborated an accurate collaboration policy between the two models performing the automatic labeling. Moreover, our co-training seamlessly leverages datasets from different synthetic domains. In addition, the co-training procedure is agnostic to the loss function used to train the semantic segmentation models which perform the automatic labeling. We achieve state-of-the-art results on publicly available benchmark datasets, again, remaining competitive compared to human labeling. Finally, on the ground of our previous experience, we have designed and implemented a new SSL technique for UDA in the context of visual semantic segmentation. In this case, we mimic the labeling methodology followed by human labelers. In particular, rather than labeling full images at a time, categories of semantic classes are defined and only those are labeled in a labeling pass. In fact, different human labelers can become specialists in labeling different categories. Afterward, these per-category-labeled layers are combined to provide fully labeled images. Our technique is inspired by this methodology since we perform synth-to-real UDA per category, using the self-training stage previously developed as part of our co-training framework. The pseudo-labels obtained for each category are finally fused to obtain fully automatically labeled images. In this context, we have also contributed to the development of a new photo-realistic synthetic dataset based on path-tracing rendering. Our new SSL technique seamlessly leverages publicly available synthetic datasets as well as this new one to obtain state-of-the-art results on synth-to-real UDA for semantic segmentation. We show that the new dataset allows us to reach better labeling accuracy than previously existing datasets, at the same time that it complements well them when combined. Moreover, we also show that the new human-inspired SSL technique outperforms co-training.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Gom2023			Serial	3961
Permanent link to this record



	Author	Yi Xiao
	Title	Advancing Vision-based End-to-End Autonomous Driving			Type	Book Whole
	Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving. We propose to evaluate three approaches to tackle the challenge of end-to-end autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular depth estimation module, where human labeling is not needed. Then, based on the intuition that the latent space of end-to-end driving models encodes relevant information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning. CIL++ leverages modern best practices, such as a large horizontal field of view and a self-attention mechanism, which are contributing to the agent’s understanding of the driving scene and bringing a better imitation of human drivers. Using training data without any human labeling, our model yields almost expert performance in the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-126409-4-6	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Xia2023			Serial	3964
Permanent link to this record



	Author	David Vazquez
	Title	Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection			Type	Book Whole
	Year	2013	Publication	PhD Thesis, Universitat de Barcelona-CVC	Abbreviated Journal
	Volume	1	Issue	1	Pages	1-105
	Keywords	Pedestrian Detection; Domain Adaptation
	Abstract	Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.
	Address	Barcelona
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication	Barcelona	Editor	Antonio Lopez;Daniel Ponsa
	Language	English	Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940530-1-6	Medium
	Area		Expedition		Conference
	Notes	adas			Approved	yes
	Call Number	ADAS @ adas @ Vaz2013			Serial	2276
Permanent link to this record



	Author	Javier Marin; David Geronimo; David Vazquez; Antonio Lopez
	Title	Pedestrian Detection: Exploring Virtual Worlds			Type	Book Chapter
	Year	2012	Publication	Handbook of Pattern Recognition: Methods and Application	Abbreviated Journal
	Volume	5	Issue		Pages	145-162
	Keywords	Virtual worlds; Pedestrian Detection; Domain Adaptation
	Abstract	Handbook of pattern recognition will include contributions from university educators and active research experts. This Handbook is intended to serve as a basic reference on methods and applications of pattern recognition. The primary aim of this handbook is providing the community of pattern recognition with a readable, easy to understand resource that covers introductory, intermediate and advanced topics with equal clarity. Therefore, the Handbook of pattern recognition can serve equally well as reference resource and as classroom textbook. Contributions cover all methods, techniques and applications of pattern recognition. A tentative list of relevant topics might include: 1- Statistical, structural, syntactic pattern recognition. 2- Neural networks, machine learning, data mining. 3- Discrete geometry, algebraic, graph-based techniques for pattern recognition. 4- Face recognition, Signal analysis, image coding and processing, shape and texture analysis. 5- Document processing, text and graphics recognition, digital libraries. 6- Speech recognition, music analysis, multimedia systems. 7- Natural language analysis, information retrieval. 8- Biometrics, biomedical pattern analysis and information systems. 9- Other scientific, engineering, social and economical applications of pattern recognition. 10- Special hardware architectures, software packages for pattern recognition.
	Address
	Corporate Author				Thesis
	Publisher	iConcept Press	Place of Publication		Editor
	Language	English	Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-477554-82-1	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ MGV2012			Serial	1979
Permanent link to this record



	Author	Jose Manuel Alvarez; Antonio Lopez
	Title	Photometric Invariance by Machine Learning			Type	Book Chapter
	Year	2012	Publication	Color in Computer Vision: Fundamentals and Applications	Abbreviated Journal
	Volume	7	Issue		Pages	113-134
	Keywords	road detection
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	iConcept Press Ltd	Place of Publication		Editor	Theo Gevers, Arjan Gijsenij, Joost van de Weijer, Jan-Mark Geusebroek
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-0-470-89084-4	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ AlL2012			Serial	2186
Permanent link to this record



	Author	Angel Sappa; Boris X. Vintimilla
	Title	Edge Point Linking by Means of Global and Local Schemes			Type	Book Chapter
	Year	2008	Publication	in Signal Processing for Image Enhancement and Multimedia Processing	Abbreviated Journal
	Volume	11	Issue		Pages	115–125
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	E. Damiani
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ SaV2008			Serial	938
Permanent link to this record



	Author	German Ros; Laura Sellart; Gabriel Villalonga; Elias Maidanik; Francisco Molero; Marc Garcia; Adriana Cedeño; Francisco Perez; Didier Ramirez; Eduardo Escobar; Jose Luis Gomez; David Vazquez; Antonio Lopez
	Title	Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA			Type	Book Chapter
	Year	2017	Publication	Domain Adaptation in Computer Vision Applications	Abbreviated Journal
	Volume	12	Issue		Pages	227-241
	Keywords	SYNTHIA; Virtual worlds; Autonomous Driving
	Abstract	Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	Gabriela Csurka
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.085; 600.082; 600.076; 600.118			Approved	no
	Call Number	ADAS @ adas @ RSV2017			Serial	2882
Permanent link to this record



	Author	Angel Sappa; Niki Aifanti; Sotiris Malassiotis; Michael G. Strintzis
	Title	Prior Knowledge Based Motion Model Representation			Type	Book Chapter
	Year	2009	Publication	Progress in Computer Vision and Image Analysis	Abbreviated Journal
	Volume	16	Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Horst Bunke; JuanJose Villanueva; Gemma Sanchez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ SAM2009			Serial	1235
Permanent link to this record



	Author	Fadi Dornaika; Angel Sappa
	Title	Real Time Image Registration for Planar Structure and 3D Sensor Pose Estimation			Type	Book Chapter
	Year	2008	Publication	Stereo Vision	Abbreviated Journal
	Volume	18	Issue		Pages	299–316
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Asim Bhatti
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ DoS2008c			Serial	1057
Permanent link to this record

Select All Deselect All

<< 1 2 3 4 5 6 7 >>

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: