%0 Thesis
%T Leveraging Synthetic Data to Create Autonomous Driving Perception Systems
%A Gabriel Villalonga
%E Antonio Lopez
%E German Ros
%D 2021
%I Ediciones Graficas Rey
%@ 978-84-122714-2-3
%F Gabriel Villalonga2021
%O ADAS; 600.118
%O exported from refbase (http://refbase.cvc.uab.es/show.php?record=3599), last updated on Thu, 16 Dec 2021 13:02:11 +0100
%X Manually annotating images to develop vision models has been a major bottlenecksince computer vision and machine learning started to walk together. This hasbeen more evident since computer vision falls on the shoulders of data-hungrydeep learning techniques. When addressing on-board perception for autonomousdriving, the curse of data annotation is exacerbated due to the use of additionalsensors such as LiDAR. Therefore, any approach aiming at reducing such a timeconsuming and costly work is of high interest for addressing autonomous drivingand, in fact, for any application requiring some sort of artificial perception. In thelast decade, it has been shown that leveraging from synthetic data is a paradigmworth to pursue in order to minimizing manual data annotation. The reason isthat the automatic process of generating synthetic data can also produce differenttypes of associated annotations (e.g. object bounding boxes for synthetic imagesand LiDAR pointclouds, pixel/point-wise semantic information, etc.). Directlyusing synthetic data for training deep perception models may not be the definitivesolution in all circumstances since it can appear a synth-to-real domain shift. Inthis context, this work focuses on leveraging synthetic data to alleviate manualannotation for three perception tasks related to driving assistance and autonomousdriving. In all cases, we assume the use of deep convolutional neural networks(CNNs) to develop our perception models.The first task addresses traffic sign recognition (TSR), a kind of multi-classclassification problem. We assume that the number of sign classes to be recognizedmust be suddenly increased without having annotated samples to perform thecorresponding TSR CNN re-training. We show that leveraging synthetic samples ofsuch new classes and transforming them by a generative adversarial network (GAN)trained on the known classes (i.e. without using samples from the new classes), it ispossible to re-train the TSR CNN to properly classify all the signs for a ∼ 1/4 ratio ofnew/known sign classes. The second task addresses on-board 2D object detection,focusing on vehicles and pedestrians. In this case, we assume that we receive a setof images without the annotations required to train an object detector, i.e. withoutobject bounding boxes. Therefore, our goal is to self-annotate these images sothat they can later be used to train the desired object detector. In order to reachthis goal, we leverage from synthetic data and propose a semi-supervised learningapproach based on the co-training idea. In fact, we use a GAN to reduce the synthto-real domain shift before applying co-training. Our quantitative results showthat co-training and GAN-based image-to-image translation complement eachother up to allow the training of object detectors without manual annotation, and still almost reaching the upper-bound performances of the detectors trained fromhuman annotations. While in previous tasks we focus on vision-based perception,the third task we address focuses on LiDAR pointclouds. Our initial goal was todevelop a 3D object detector trained on synthetic LiDAR-style pointclouds. Whilefor images we may expect synth/real-to-real domain shift due to differences intheir appearance (e.g. when source and target images come from different camerasensors), we did not expect so for LiDAR pointclouds since these active sensorsfactor out appearance and provide sampled shapes. However, in practice, we haveseen that it can be domain shift even among real-world LiDAR pointclouds. Factorssuch as the sampling parameters of the LiDARs, the sensor suite configuration onboard the ego-vehicle, and the human annotation of 3D bounding boxes, do inducea domain shift. We show it through comprehensive experiments with differentpublicly available datasets and 3D detectors. This redirected our goal towards thedesign of a GAN for pointcloud-to-pointcloud translation, a relatively unexploredtopic.Finally, it is worth to mention that all the synthetic datasets used for these threetasks, have been designed and generated in the context of this PhD work and willbe publicly released. Overall, we think this PhD presents several steps forward toencourage leveraging synthetic data for developing deep perception models in thefield of driving assistance and autonomous driving.
%9 theses
%9 Ph.D. thesis