Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	586–600 of 3413 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

List View

Citations

Details

	Records
	Author	Cesar de Souza
	Title	Action Recognition in Videos: Data-efficient approaches for supervised learning of human action classification models for video			Type	Book Whole
	Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this dissertation, we explore different ways to perform human action recognition in video clips. We focus on data efficiency, proposing new approaches that alleviate the need for laborious and time-consuming manual data annotation. In the first part of this dissertation, we start by analyzing previous state-of-the-art models, comparing their differences and similarities in order to pinpoint where their real strengths come from. Leveraging this information, we then proceed to boost the classification accuracy of shallow models to levels that rival deep neural networks. We introduce hybrid video classification architectures based on carefully designed unsupervised representations of handcrafted spatiotemporal features classified by supervised deep networks. We show in our experiments that our hybrid model combine the best of both worlds: it is data efficient (trained on 150 to 10,000 short clips) and yet improved significantly on the state of the art, including deep models trained on millions of manually labeled images and videos. In the second part of this research, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We then introduce deep multi-task representation learning architectures to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
	Address	April 2018
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;Naila Murray
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Sou2018			Serial	3127
Permanent link to this record



	Author	Adrien Gaidon; Antonio Lopez; Florent Perronnin
	Title	The Reasonable Effectiveness of Synthetic Visual Data			Type	Journal Article
	Year	2018	Publication	International Journal of Computer Vision	Abbreviated Journal	IJCV
	Volume	126	Issue	9	Pages	899–901
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ GLP2018			Serial	3180
Permanent link to this record



	Author	Antonio Lopez
	Title	Pedestrian Detection Systems			Type	Book Chapter
	Year	2018	Publication	Wiley Encyclopedia of Electrical and Electronics Engineering	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pedestrian detection is a highly relevant topic for both advanced driver assistance systems (ADAS) and autonomous driving. In this entry, we review the ideas behind pedestrian detection systems from the point of view of perception based on computer vision and machine learning.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Lop2018			Serial	3230
Permanent link to this record



	Author	Santi Puch; Irina Sanchez; Aura Hernandez-Sabate; Gemma Piella; Vesna Prckovska
	Title	Global Planar Convolutions for Improved Context Aggregation in Brain Tumor Segmentation			Type	Conference Article
	Year	2018	Publication	International MICCAI Brainlesion Workshop	Abbreviated Journal
	Volume	11384	Issue		Pages	393-405
	Keywords	Brain tumors; 3D fully-convolutional CNN; Magnetic resonance imaging; Global planar convolution
	Abstract	In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel architecture based on these two architectures, ContextNet, that includes the proposed Global Planar Convolution module. We show that the addition of such module eliminates the need of building networks with several representation levels, which tend to be over-parametrized and to showcase slow rates of convergence. Furthermore, we provide a visual demonstration of the behavior of GPC modules via visualization of intermediate representations. We finally participate in the 2018 edition of the BraTS challenge with our best performing models, that are based on ContextNet, and report the evaluation scores on the validation and the test sets of the challenge.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	MICCAIW
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ PSH2018			Serial	3251
Permanent link to this record



	Author	Spyridon Bakas; Mauricio Reyes; Andras Jakab; Stefan Bauer; Markus Rempfler; Alessandro Crimi; Russell Takeshi Shinohara; Christoph Berger; Sung Min Ha; Martin Rozycki; Marcel Prastawa; Esther Alberts; Jana Lipkova; John Freymann; Justin Kirby; Michel Bilello; Hassan Fathallah-Shaykh; Roland Wiest; Jan Kirschke; Benedikt Wiestler; Rivka Colen; Aikaterini Kotrotsou; Pamela Lamontagne; Daniel Marcus; Mikhail Milchenko; Arash Nazeri; Marc-Andre Weber; Abhishek Mahajan; Ujjwal Baid; Dongjin Kwon; Manu Agarwal; Mahbubul Alam; Alberto Albiol; Antonio Albiol; Varghese Alex; Tuan Anh Tran; Tal Arbel; Aaron Avery; Subhashis Banerjee; Thomas Batchelder; Kayhan Batmanghelich; Enzo Battistella; Martin Bendszus; Eze Benson; Jose Bernal; George Biros; Mariano Cabezas; Siddhartha Chandra; Yi-Ju Chang; Joseph Chazalon; Shengcong Chen; Wei Chen; Jefferson Chen; Kun Cheng; Meinel Christoph; Roger Chylla; Albert Clérigues; Anthony Costa; Xiaomeng Cui; Zhenzhen Dai; Lutao Dai; Eric Deutsch; Changxing Ding; Chao Dong; Wojciech Dudzik; Theo Estienne; Hyung Eun Shin; Richard Everson; Jonathan Fabrizio; Longwei Fang; Xue Feng; Lucas Fidon; Naomi Fridman; Huan Fu; David Fuentes; David G Gering; Yaozong Gao; Evan Gates; Amir Gholami; Mingming Gong; Sandra Gonzalez-Villa; J Gregory Pauloski; Yuanfang Guan; Sheng Guo; Sudeep Gupta; Meenakshi H Thakur; Klaus H Maier-Hein; Woo-Sup Han; Huiguang He; Aura Hernandez-Sabate; Evelyn Herrmann; Naveen Himthani; Winston Hsu; Cheyu Hsu; Xiaojun Hu; Xiaobin Hu; Yan Hu; Yifan Hu; Rui Hua
	Title	Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge			Type	Miscellaneous
	Year	2018	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	BraTS; challenge; brain; tumor; segmentation; machine learning; glioma; glioblastoma; radiomics; survival; progression; RECIST
	Abstract	Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multiparametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e. 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in preoperative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that undergone gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ BRJ2018			Serial	3252
Permanent link to this record



	Author	Cesar de Souza; Adrien Gaidon; Eleonora Vig; Antonio Lopez
	Title	System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture			Type	Patent
	Year	2018	Publication	US9946933B2	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	US9946933B2
	Abstract	A computer-implemented video classification method and system are disclosed. The method includes receiving an input video including a sequence of frames. At least one transformation of the input video is generated, each transformation including a sequence of frames. For the input video and each transformation, local descriptors are extracted from the respective sequence of frames. The local descriptors of the input video and each transformation are aggregated to form an aggregated feature vector with a first set of processing layers learned using unsupervised learning. An output classification value is generated for the input video, based on the aggregated feature vector with a second set of processing layers learned using supervised learning.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ SGV2018			Serial	3255
Permanent link to this record



	Author	Chris Bahnsen; David Vazquez; Antonio Lopez; Thomas B. Moeslund
	Title	Learning to Remove Rain in Traffic Surveillance by Using Synthetic Data			Type	Conference Article
	Year	2019	Publication	14th International Conference on Computer Vision Theory and Applications	Abbreviated Journal
	Volume		Issue		Pages	123-130
	Keywords	Rain Removal; Traffic Surveillance; Image Denoising
	Abstract	Rainfall is a problem in automated traffic surveillance. Rain streaks occlude the road users and degrade the overall visibility which in turn decrease object detection performance. One way of alleviating this is by artificially removing the rain from the images. This requires knowledge of corresponding rainy and rain-free images. Such images are often produced by overlaying synthetic rain on top of rain-free images. However, this method fails to incorporate the fact that rain fall in the entire three-dimensional volume of the scene. To overcome this, we introduce training data from the SYNTHIA virtual world that models rain streaks in the entirety of a scene. We train a conditional Generative Adversarial Network for rain removal and apply it on traffic surveillance images from SYNTHIA and the AAU RainSnow datasets. To measure the applicability of the rain-removed images in a traffic surveillance context, we run the YOLOv2 object detection algorithm on the original and rain-removed frames. The results on SYNTHIA show an 8% increase in detection accuracy compared to the original rain image. Interestingly, we find that high PSNR or SSIM scores do not imply good object detection performance.
	Address	Praga; Czech Republic; February 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	VISIGRAPP
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ BVL2019			Serial	3256
Permanent link to this record



	Author	Jiaolong Xu; Liang Xiao; Antonio Lopez
	Title	Self-supervised Domain Adaptation for Computer Vision Tasks			Type	Journal Article
	Year	2019	Publication	IEEE Access	Abbreviated Journal	ACCESS
	Volume	7	Issue		Pages	156694 - 156706
	Keywords
	Abstract	Recent progress of self-supervised visual representation learning has achieved remarkable success on many challenging computer vision benchmarks. However, whether these techniques can be used for domain adaptation has not been explored. In this work, we propose a generic method for self-supervised domain adaptation, using object recognition and semantic segmentation of urban scenes as use cases. Focusing on simple pretext/auxiliary tasks (e.g. image rotation prediction), we assess different learning strategies to improve domain adaptation effectiveness by self-supervision. Additionally, we propose two complementary strategies to further boost the domain adaptation accuracy on semantic segmentation within our method, consisting of prediction layer alignment and batch normalization calibration. The experimental results show adaptation levels comparable to most studied domain adaptation methods, thus, bringing self-supervision as a new alternative for reaching domain adaptation. The code is available at this link. https://github.com/Jiaolong/self-supervised-da.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ XXL2019			Serial	3302
Permanent link to this record



	Author	Zhijie Fang; Antonio Lopez
	Title	Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation			Type	Journal Article
	Year	2019	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
	Volume	21	Issue	11	Pages	4773 - 4783
	Keywords
	Abstract	Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists is critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, thus, should be taken into account by systems providing any level of driving assistance, from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this paper, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow traffic rules to indicate future maneuvers with arm signals. In the case of pedestrians, no indications can be assumed. Instead, we hypothesize that the walking pattern of a pedestrian allows to determine if he/she has the intention of crossing the road in the path of the ego-vehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this paper, we show how the same methodology can be used for recognizing pedestrians and cyclists' intentions. For pedestrians, we perform experiments on the JAAD dataset. For cyclists, we did not found an analogous dataset, thus, we created our own one by acquiring and annotating videos which we share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ FaL2019			Serial	3305
Permanent link to this record



	Author	Felipe Codevilla
	Title	On Building End-to-End Driving Models Through Imitation Learning			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Autonomous vehicles are now considered as an assured asset in the future. Literally, all the relevant car-markers are now in a race to produce fully autonomous vehicles. These car-makers usually make use of modular pipelines for designing autonomous vehicles. This strategy decomposes the problem in a variety of tasks such as object detection and recognition, semantic and instance segmentation, depth estimation, SLAM and place recognition, as well as planning and control. Each module requires a separate set of expert algorithms, which are costly specially in the amount of human labor and necessity of data labelling. An alternative, that recently has driven considerable interest, is the end-to-end driving. In the end-to-end driving paradigm, perception and control are learned simultaneously using a deep network. These sensorimotor models are typically obtained by imitation learning fromhuman demonstrations. The main advantage is that this approach can directly learn from large fleets of human-driven vehicles without requiring a fixed ontology and extensive amounts of labeling. However, scaling end-to-end driving methods to behaviors more complex than simple lane keeping or lead vehicle following remains an open problem. On this thesis, in order to achieve more complex behaviours, we address some issues when creating end-to-end driving system through imitation learning. The first of themis a necessity of an environment for algorithm evaluation and collection of driving demonstrations. On this matter, we participated on the creation of the CARLA simulator, an open source platformbuilt from ground up for autonomous driving validation and prototyping. Since the end-to-end approach is purely reactive, there is also the necessity to provide an interface with a global planning system. With this, we propose the conditional imitation learning that conditions the actions produced into some high level command. Evaluation is also a concern and is commonly performed by comparing the end-to-end network output to some pre-collected driving dataset. We show that this is surprisingly weakly correlated to the actual driving and propose strategies on how to better acquire data and a better comparison strategy. Finally, we confirmwell-known generalization issues (due to dataset bias and overfitting), new ones (due to dynamic objects and the lack of a causal model), and training instability; problems requiring further research before end-to-end driving through imitation can scale to real-world driving.
	Address	May 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Cod2019			Serial	3387
Permanent link to this record



	Author	Zhijie Fang
	Title	Behavior understanding of vulnerable road users by 2D pose estimation			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists can be critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, therefore, should be taken into account by systems providing any level of driving assistance, i.e. from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this PhD work, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow the established traffic codes to indicate future left/right turns and stop maneuvers with arm signals. In the case of pedestrians, no indications can be assumed a priori. Instead, we hypothesize that the walking pattern of a pedestrian can allow us to determine if he/she has the intention of crossing the road in the path of the egovehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this PhD work, we show how the same methodology can be used for recognizing pedestrians and cyclists’ intentions. For pedestrians, we perform experiments on the publicly available Daimler and JAAD datasets. For cyclists, we did not found an analogous dataset, therefore, we created our own one by acquiring and annotating corresponding video-sequences which we aim to share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.
	Address	May 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;David Vazquez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-6-6	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Fan2019			Serial	3388
Permanent link to this record



	Author	Akhil Gurram; Ahmet Faruk Tuna; Fengyi Shen; Onay Urfalioglu; Antonio Lopez
	Title	Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision			Type	Journal Article
	Year	2021	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
	Volume	23	Issue	8	Pages	12738-12751
	Keywords
	Abstract	Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ GTS2021			Serial	3598
Permanent link to this record



	Author	Gabriel Villalonga
	Title	Leveraging Synthetic Data to Create Autonomous Driving Perception Systems			Type	Book Whole
	Year	2021	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Manually annotating images to develop vision models has been a major bottleneck since computer vision and machine learning started to walk together. This has been more evident since computer vision falls on the shoulders of data-hungry deep learning techniques. When addressing on-board perception for autonomous driving, the curse of data annotation is exacerbated due to the use of additional sensors such as LiDAR. Therefore, any approach aiming at reducing such a timeconsuming and costly work is of high interest for addressing autonomous driving and, in fact, for any application requiring some sort of artificial perception. In the last decade, it has been shown that leveraging from synthetic data is a paradigm worth to pursue in order to minimizing manual data annotation. The reason is that the automatic process of generating synthetic data can also produce different types of associated annotations (e.g. object bounding boxes for synthetic images and LiDAR pointclouds, pixel/point-wise semantic information, etc.). Directly using synthetic data for training deep perception models may not be the definitive solution in all circumstances since it can appear a synth-to-real domain shift. In this context, this work focuses on leveraging synthetic data to alleviate manual annotation for three perception tasks related to driving assistance and autonomous driving. In all cases, we assume the use of deep convolutional neural networks (CNNs) to develop our perception models. The first task addresses traffic sign recognition (TSR), a kind of multi-class classification problem. We assume that the number of sign classes to be recognized must be suddenly increased without having annotated samples to perform the corresponding TSR CNN re-training. We show that leveraging synthetic samples of such new classes and transforming them by a generative adversarial network (GAN) trained on the known classes (i.e. without using samples from the new classes), it is possible to re-train the TSR CNN to properly classify all the signs for a ∼ 1/4 ratio of new/known sign classes. The second task addresses on-board 2D object detection, focusing on vehicles and pedestrians. In this case, we assume that we receive a set of images without the annotations required to train an object detector, i.e. without object bounding boxes. Therefore, our goal is to self-annotate these images so that they can later be used to train the desired object detector. In order to reach this goal, we leverage from synthetic data and propose a semi-supervised learning approach based on the co-training idea. In fact, we use a GAN to reduce the synthto-real domain shift before applying co-training. Our quantitative results show that co-training and GAN-based image-to-image translation complement each other up to allow the training of object detectors without manual annotation, and still almost reaching the upper-bound performances of the detectors trained from human annotations. While in previous tasks we focus on vision-based perception, the third task we address focuses on LiDAR pointclouds. Our initial goal was to develop a 3D object detector trained on synthetic LiDAR-style pointclouds. While for images we may expect synth/real-to-real domain shift due to differences in their appearance (e.g. when source and target images come from different camera sensors), we did not expect so for LiDAR pointclouds since these active sensors factor out appearance and provide sampled shapes. However, in practice, we have seen that it can be domain shift even among real-world LiDAR pointclouds. Factors such as the sampling parameters of the LiDARs, the sensor suite configuration onboard the ego-vehicle, and the human annotation of 3D bounding boxes, do induce a domain shift. We show it through comprehensive experiments with different publicly available datasets and 3D detectors. This redirected our goal towards the design of a GAN for pointcloud-to-pointcloud translation, a relatively unexplored topic. Finally, it is worth to mention that all the synthetic datasets used for these three tasks, have been designed and generated in the context of this PhD work and will be publicly released. Overall, we think this PhD presents several steps forward to encourage leveraging synthetic data for developing deep perception models in the field of driving assistance and autonomous driving.
	Address	February 2021
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;German Ros
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-2-3	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Vil2021			Serial	3599
Permanent link to this record



	Author	Yi Xiao; Felipe Codevilla; Christopher Pal; Antonio Lopez
	Title	Action-Based Representation Learning for Autonomous Driving			Type	Conference Article
	Year	2020	Publication	Conference on Robot Learning	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).
	Address	virtual; November 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CORL
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ XCP2020			Serial	3487
Permanent link to this record



	Author	Gabriel Villalonga; Antonio Lopez
	Title	Co-Training for On-Board Deep Object Detection			Type	Journal Article
	Year	2020	Publication	IEEE Access	Abbreviated Journal	ACCESS
	Volume		Issue		Pages	194441 - 194456
	Keywords
	Abstract	Providing ground truth supervision to train visual models has been a bottleneck over the years, exacerbated by domain shifts which degenerate the performance of such models. This was the case when visual tasks relied on handcrafted features and shallow machine learning and, despite its unprecedented performance gains, the problem remains open within the deep learning paradigm due to its data-hungry nature. Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes which localize class instances (i.e. objects) within the training images. Thus, object detection is one of such tasks for which human labeling is a major bottleneck. In this article, we assess co-training as a semi-supervised learning method for self-labeling objects in unlabeled images, so reducing the human-labeling effort for developing deep object detectors. Our study pays special attention to a scenario involving domain shift; in particular, when we have automatically generated virtual-world images with object bounding boxes and we have real-world images which are unlabeled. Moreover, we are particularly interested in using co-training for deep object detection in the context of driver assistance systems and/or self-driving vehicles. Thus, using well-established datasets and protocols for object detection in these application contexts, we will show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ ViL2020			Serial	3488
Permanent link to this record