Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 >>

Details

	Records
	Author	Jose Luis Gomez Zurita
	Title	Synth-to-real semi-supervised learning for visual tasks			Type	Book Whole
	Year	2023	Publication	Going beyond Classification Problems for the Continual Learning of Deep Neural Networks	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The curse of data labeling is a costly bottleneck in supervised deep learning, where large amounts of labeled data are needed to train intelligent systems. In onboard perception for autonomous driving, this cost corresponds to the labeling of raw data from sensors such as cameras, LiDARs, RADARs, etc. Therefore, synthetic data with automatically generated ground truth (labels) has aroused as a reliable alternative for training onboard perception models. However, synthetic data commonly suffers from synth-to-real domain shift, i.e., models trained on the synthetic domain do not show their achievable accuracy when performing in the real world. This shift needs to be addressed by techniques falling in the realm of domain adaptation (DA). The semi-supervised learning (SSL) paradigm can be followed to address DA. In this case, a model is trained using source data with labels (here synthetic) and leverages minimal knowledge from target data (here the real world) to generate pseudo-labels. These pseudo-labels help the training process to reduce the gap between the source and the target domains. In general, we can assume accessing both, pseudo-labels and a few amounts of human-provided labels for the target-domain data. However, the most interesting and challenging setting consists in assuming that we do not have human-provided labels at all. This setting is known as unsupervised domain adaptation (UDA). This PhD focuses on applying SSL to the UDA setting, for onboard visual tasks related to autonomous driving. We start by addressing the synth-to-real UDA problem on onboard vision-based object detection (pedestrians and cars), a critical task for autonomous driving and driving assistance. In particular, we propose to apply an SSL technique known as co-training, which we adapt to work with deep models that process a multi-modal input. The multi-modality consists of the visual appearance of the images (RGB) and their monocular depth estimation. The synthetic data we use as the source domain contains both, object bounding boxes and depth information. This prior knowledge is the starting point for the co-training technique, which iteratively labels unlabeled real-world data and uses such pseudolabels (here bounding boxes with an assigned object class) to progressively improve the labeling results. Along this process, two models collaborate to automatically label the images, in a way that one model compensates for the errors of the other, so avoiding error drift. While this automatic labeling process is done offline, the resulting pseudolabels can be used to train object detection models that must perform in real-time onboard a vehicle. We show that multi-modal co-training improves the labeling results compared to single-modal co-training, remaining competitive compared to human labeling. Given the success of co-training in the context of object detection, we have also adapted this technique to a more crucial and challenging visual task, namely, onboard semantic segmentation. In fact, providing labels for a single image can take from 30 to 90 minutes for a human labeler, depending on the content of the image. Thus, developing automatic labeling techniques for this visual task is of great interest to the automotive industry. In particular, the new co-training framework addresses synth-to-real UDA by an initial stage of self-training. Intermediate models arising from this stage are used to start the co-training procedure, for which we have elaborated an accurate collaboration policy between the two models performing the automatic labeling. Moreover, our co-training seamlessly leverages datasets from different synthetic domains. In addition, the co-training procedure is agnostic to the loss function used to train the semantic segmentation models which perform the automatic labeling. We achieve state-of-the-art results on publicly available benchmark datasets, again, remaining competitive compared to human labeling. Finally, on the ground of our previous experience, we have designed and implemented a new SSL technique for UDA in the context of visual semantic segmentation. In this case, we mimic the labeling methodology followed by human labelers. In particular, rather than labeling full images at a time, categories of semantic classes are defined and only those are labeled in a labeling pass. In fact, different human labelers can become specialists in labeling different categories. Afterward, these per-category-labeled layers are combined to provide fully labeled images. Our technique is inspired by this methodology since we perform synth-to-real UDA per category, using the self-training stage previously developed as part of our co-training framework. The pseudo-labels obtained for each category are finally fused to obtain fully automatically labeled images. In this context, we have also contributed to the development of a new photo-realistic synthetic dataset based on path-tracing rendering. Our new SSL technique seamlessly leverages publicly available synthetic datasets as well as this new one to obtain state-of-the-art results on synth-to-real UDA for semantic segmentation. We show that the new dataset allows us to reach better labeling accuracy than previously existing datasets, at the same time that it complements well them when combined. Moreover, we also show that the new human-inspired SSL technique outperforms co-training.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Gom2023			Serial	3961
Permanent link to this record



	Author	Yi Xiao
	Title	Advancing Vision-based End-to-End Autonomous Driving			Type	Book Whole
	Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving. We propose to evaluate three approaches to tackle the challenge of end-to-end autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular depth estimation module, where human labeling is not needed. Then, based on the intuition that the latent space of end-to-end driving models encodes relevant information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning. CIL++ leverages modern best practices, such as a large horizontal field of view and a self-attention mechanism, which are contributing to the agent’s understanding of the driving scene and bringing a better imitation of human drivers. Using training data without any human labeling, our model yields almost expert performance in the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-126409-4-6	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ Xia2023			Serial	3964
Permanent link to this record



	Author	Lluis Pere de las Heras; Ernest Valveny; Gemma Sanchez
	Title	Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies			Type	Book Chapter
	Year	2014	Publication	Graphics Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	8746	Issue		Pages	109-121
	Keywords	Graphics recognition; Floor plan analysis; Object segmentation
	Abstract	In this paper we present a wall segmentation approach in floor plans that is able to work independently to the graphical notation, does not need any pre-annotated data for learning, and is able to segment multiple-shaped walls such as beams and curved-walls. This method results from the combination of the wall segmentation approaches [3, 5] presented recently by the authors. Firstly, potential straight wall segments are extracted in an unsupervised way similar to [3], but restricting even more the wall candidates considered in the original approach. Then, based on [5], these segments are used to learn the texture pattern of walls and spot the lost instances. The presented combination of both methods has been tested on 4 available datasets with different notations and compared qualitatively and quantitatively to the state-of-the-art applied on these collections. Additionally, some qualitative results on floor plans directly downloaded from the Internet are reported in the paper. The overall performance of the method demonstrates either its adaptability to different wall notations and shapes, and to document qualities and resolutions.
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-662-44853-3	Medium
	Area		Expedition		Conference
	Notes	DAG; ADAS; 600.076; 600.077			Approved	no
	Call Number	Admin @ si @ HVS2014			Serial	2535
Permanent link to this record



	Author	Lluis Pere de las Heras; David Fernandez; Alicia Fornes; Ernest Valveny; Gemma Sanchez; Josep Llados
	Title	Runlength Histogram Image Signature for Perceptual Retrieval of Architectural Floor Plans			Type	Book Chapter
	Year	2014	Publication	Graphics Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	8746	Issue		Pages	135-146
	Keywords	Graphics recognition; Graphics retrieval; Image classification
	Abstract	This paper proposes a runlength histogram signature as a perceptual descriptor of architectural plans in a retrieval scenario. The style of an architectural drawing is characterized by the perception of lines, shapes and texture. Such visual stimuli are the basis for defining semantic concepts as space properties, symmetry, density, etc. We propose runlength histograms extracted in vertical, horizontal and diagonal directions as a characterization of line and space properties in floorplans, so it can be roughly associated to a description of walls and room structure. A retrieval application illustrates the performance of the proposed approach, where given a plan as a query, similar ones are obtained from a database. A ground truth based on human observation has been constructed to validate the hypothesis. Additional retrieval results on sketched building’s facades are reported qualitatively in this paper. Its good description and its adaptability to two different sketch drawings despite its simplicity shows the interest of the proposed approach and opens a challenging research line in graphics recognition.
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-662-44853-3	Medium
	Area		Expedition		Conference
	Notes	DAG; ADAS; 600.045; 600.056; 600.061; 600.076; 600.077			Approved	no
	Call Number	Admin @ si @ HFF2014			Serial	2536
Permanent link to this record



	Author	Hanne Kause; Aura Hernandez-Sabate; Patricia Marquez; Andrea Fuster; Luc Florack; Hans van Assen; Debora Gil
	Title	Confidence Measures for Assessing the HARP Algorithm in Tagged Magnetic Resonance Imaging			Type	Book Chapter
	Year	2015	Publication	Statistical Atlases and Computational Models of the Heart. Revised selected papers of Imaging and Modelling Challenges 6th International Workshop, STACOM 2015, Held in Conjunction with MICCAI 2015	Abbreviated Journal
	Volume	9534	Issue		Pages	69-79
	Keywords
	Abstract	Cardiac deformation and changes therein have been linked to pathologies. Both can be extracted in detail from tagged Magnetic Resonance Imaging (tMRI) using harmonic phase (HARP) images. Although point tracking algorithms have shown to have high accuracies on HARP images, these vary with position. Detecting and discarding areas with unreliable results is crucial for use in clinical support systems. This paper assesses the capability of two confidence measures (CMs), based on energy and image structure, for detecting locations with reduced accuracy in motion tracking results. These CMs were tested on a database of simulated tMRI images containing the most common artifacts that may affect tracking accuracy. CM performance is assessed based on its capability for HARP tracking error bounding and compared in terms of significant differences detected using a multi comparison analysis of variance that takes into account the most influential factors on HARP tracking performance. Results showed that the CM based on image structure was better suited to detect unreliable optical flow vectors. In addition, it was shown that CMs can be used to detect optical flow vectors with large errors in order to improve the optical flow obtained with the HARP tracking algorithm.
	Address	Munich; Germany; January 2015
	Corporate Author				Thesis
	Publisher	Springer International Publishing	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-319-28711-9	Medium
	Area		Expedition		Conference	STACOM
	Notes	ADAS; IAM; 600.075; 600.076; 600.060; 601.145			Approved	no
	Call Number	Admin @ si @ KHM2015			Serial	2734
Permanent link to this record



	Author	Angel Sappa; David Geronimo; Fadi Dornaika; Mohammad Rouhani; Antonio Lopez
	Title	Moving object detection from mobile platforms using stereo data registration			Type	Book Chapter
	Year	2012	Publication	Computational Intelligence paradigms in advanced pattern classification	Abbreviated Journal
	Volume	386	Issue		Pages	25-37
	Keywords	pedestrian detection
	Abstract	This chapter describes a robust approach for detecting moving objects from on-board stereo vision systems. It relies on a feature point quaternion-based registration, which avoids common problems that appear when computationally expensive iterative-based algorithms are used on dynamic environments. The proposed approach consists of three main stages. Initially, feature points are extracted and tracked through consecutive 2D frames. Then, a RANSAC based approach is used for registering two point sets, with known correspondences in the 3D space. The computed 3D rigid displacement is used to map two consecutive 3D point clouds into the same coordinate system by means of the quaternion method. Finally, moving objects correspond to those areas with large 3D registration errors. Experimental results show the viability of the proposed approach to detect moving objects like vehicles or pedestrians in different urban scenarios.
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	Marek R. Ogiela; Lakhmi C. Jain
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1860-949X	ISBN	978-3-642-24048-5	Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ SGD2012			Serial	2061
Permanent link to this record



	Author	David Vazquez; Antonio Lopez; Daniel Ponsa; David Geronimo
	Title	Interactive Training of Human Detectors			Type	Book Chapter
	Year	2013	Publication	Multiodal Interaction in Image and Video Applications	Abbreviated Journal
	Volume	48	Issue		Pages	169-182
	Keywords	Pedestrian Detection; Virtual World; AdaBoost; Domain Adaptation
	Abstract	Image based human detection remains as a challenging problem. Most promising detectors rely on classifiers trained with labelled samples. However, labelling is a manual labor intensive step. To overcome this problem we propose to collect images of pedestrians from a virtual city, i.e., with automatic labels, and train a pedestrian detector with them, which works fine when such virtual-world data are similar to testing one, i.e., real-world pedestrians in urban areas. When testing data is acquired in different conditions than training one, e.g., human detection in personal photo albums, dataset shift appears. In previous work, we cast this problem as one of domain adaptation and solve it with an active learning procedure. In this work, we focus on the same problem but evaluating a different set of faster to compute features, i.e., Haar, EOH and their combination. In particular, we train a classifier with virtual-world data, using such features and Real AdaBoost as learning machine. This classifier is applied to real-world training images. Then, a human oracle interactively corrects the wrong detections, i.e., few miss detections are manually annotated and some false ones are pointed out too. A low amount of manual annotation is fixed as restriction. Real- and virtual-world difficult samples are combined within what we call cool world and we retrain the classifier with this data. Our experiments show that this adapted classifier is equivalent to the one trained with only real-world data but requiring 90% less manual annotations.
	Address	Springer Heidelberg New York Dordrecht London
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language	English	Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.057; 600.054; 605.203			Approved	no
	Call Number	VLP2013; ADAS @ adas @ vlp2013			Serial	2193
Permanent link to this record



	Author	Angel Sappa; Jordi Vitria
	Title	Multimodal Interaction in Image and Video Applications			Type	Book Whole
	Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
	Volume	48	Issue		Pages
	Keywords
	Abstract	Book Series Intelligent Systems Reference Library
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
	Area		Expedition		Conference
	Notes	ADAS; OR;MV			Approved	no
	Call Number	Admin @ si @ SaV2013			Serial	2199
Permanent link to this record

Select All Deselect All

<< 1 2 3 4 5 6 7 >>

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: