Home | << 1 2 3 4 5 6 7 8 9 >> |
Records | |||||
---|---|---|---|---|---|
Author | Meysam Madadi; Hugo Bertiche; Sergio Escalera | ||||
Title | SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 106 | Issue | Pages | 107472 | |
Keywords | Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass | ||||
Abstract | In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MBE2020 | Serial | 3439 | ||
Permanent link to this record | |||||
Author | Angel Morera; Angel Sanchez; A. Belen Moreno; Angel Sappa; Jose F. Velez | ||||
Title | SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels under Multiple Variabilities | Type | Journal Article | ||
Year | 2020 | Publication | Sensors | Abbreviated Journal | SENS |
Volume | 20 | Issue | 16 | Pages | 4587 |
Keywords | |||||
Abstract | This work compares Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) deep neural networks for the outdoor advertisement panel detection problem by handling multiple and combined variabilities in the scenes. Publicity panel detection in images offers important advantages both in the real world as well as in the virtual one. For example, applications like Google Street View can be used for Internet publicity and when detecting these ads panels in images, it could be possible to replace the publicity appearing inside the panels by another from a funding company. In our experiments, both SSD and YOLO detectors have produced acceptable results under variable sizes of panels, illumination conditions, viewing perspectives, partial occlusion of panels, complex background and multiple panels in scenes. Due to the difficulty of finding annotated images for the considered problem, we created our own dataset for conducting the experiments. The major strength of the SSD model was the almost elimination of False Positive (FP) cases, situation that is preferable when the publicity contained inside the panel is analyzed after detecting them. On the other side, YOLO produced better panel localization results detecting a higher number of True Positive (TP) panels with a higher accuracy. Finally, a comparison of the two analyzed object detection models with different types of semantic segmentation networks and using the same evaluation metrics is also included. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; 600.130; 601.349; 600.122 | Approved | no | ||
Call Number | Admin @ si @ MSM2020 | Serial | 3452 | ||
Permanent link to this record | |||||
Author | Carlos Martin-Isla; Maryam Asadi-Aghbolaghi; Polyxeni Gkontra; Victor M. Campello; Sergio Escalera; Karim Lekadir | ||||
Title | Stacked BCDU-net with semantic CMR synthesis: application to Myocardial Pathology Segmentation challenge | Type | Conference Article | ||
Year | 2020 | Publication | MYOPS challenge and workshop | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Virtual; October 2020 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MICCAIW | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ MAG2020 | Serial | 3518 | ||
Permanent link to this record | |||||
Author | Thomas B. Moeslund; Sergio Escalera; Gholamreza Anbarjafari; Kamal Nasrollahi; Jun Wan | ||||
Title | Statistical Machine Learning for Human Behaviour Analysis | Type | Journal Article | ||
Year | 2020 | Publication | Entropy | Abbreviated Journal | ENTROPY |
Volume | 25 | Issue | 5 | Pages | 530 |
Keywords | action recognition; emotion recognition; privacy-aware | ||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MEA2020 | Serial | 3441 | ||
Permanent link to this record | |||||
Author | Klara Janousckova; Jiri Matas; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Text Recognition – Real World Data and Where to Find Them | Type | Conference Article | ||
Year | 2020 | Publication | 25th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 4489-4496 | ||
Keywords | |||||
Abstract | We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as “pseudo ground truth” (PGT). The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets 1 1 Acknowledgements. The authors were supported by Czech Technical University student grant SGS20/171/0HK3/3TJ13, the MEYS VVV project CZ.02.1.01/0.010.0J16 019/0000765 Research Center for Informatics, the Spanish Research project TIN2017-89779-P and the CERCA Programme / Generalitat de Catalunya. | ||||
Address | Virtual; January 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | DAG; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ JMG2020 | Serial | 3557 | ||
Permanent link to this record | |||||
Author | Sergio Escalera; Ralf Herbrich | ||||
Title | The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations | Type | Book Whole | ||
Year | 2020 | Publication | The Springer Series on Challenges in Machine Learning | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | Sergio Escalera; Ralf Hebrick | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 2520-1328 | ISBN | 978-3-030-29134-1 | Medium | |
Area | Expedition | Conference | |||
Notes | HuPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ HeE2020 | Serial | 3328 | ||
Permanent link to this record | |||||
Author | Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Lin Guo; Jiankun Hou; Armin Mehri; Parichehr Behjati Ardakani; Heena Patel; Vishal Chudasama; Kalpesh Prajapati; Kishor P. Upla; Raghavendra Ramachandra; Kiran Raja; Christoph Busch; Feras Almasri; Olivier Debeir; Sabari Nathan; Priya Kansal; Nolan Gutierrez; Bardia Mojra; William J. Beksi | ||||
Title | Thermal Image Super-Resolution Challenge – PBVS 2020 | Type | Conference Article | ||
Year | 2020 | Publication | 16h IEEE Workshop on Perception Beyond the Visible Spectrum | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This paper summarizes the top contributions to the first challenge on thermal image super-resolution (TISR), which was organized as part of the Perception Beyond the Visible Spectrum (PBVS) 2020 workshop. In this challenge, a novel thermal image dataset is considered together with state-of-the-art approaches evaluated under a common framework. The dataset used in the challenge consists of 1021 thermal images, obtained from three distinct thermal cameras at different resolutions (low-resolution, mid-resolution, and high-resolution), resulting in a total of 3063 thermal images. From each resolution, 951 images are used for training and 50 for testing while the 20 remaining images are used for two proposed evaluations. The first evaluation consists of downsampling the low-resolution, mid-resolution, and high-resolution thermal images by x2, x3 and x4 respectively, and comparing their super-resolution results with the corresponding ground truth images. The second evaluation is comprised of obtaining the x2 super-resolution from a given mid-resolution thermal image and comparing it with the corresponding semi-registered high-resolution thermal image. Out of 51 registered participants, 6 teams reached the final validation phase. | ||||
Address | Virtual CVPR | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPRW | ||
Notes | MSIAU; ISE; 600.119; 600.122 | Approved | no | ||
Call Number | Admin @ si @ RSV2020 | Serial | 3431 | ||
Permanent link to this record | |||||
Author | Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla | ||||
Title | Thermal Image Super-resolution: A Novel Architecture and Dataset | Type | Conference Article | ||
Year | 2020 | Publication | 15th International Conference on Computer Vision Theory and Applications | Abbreviated Journal | |
Volume | Issue | Pages | 111-119 | ||
Keywords | |||||
Abstract | This paper proposes a novel CycleGAN architecture for thermal image super-resolution, together with a large dataset consisting of thermal images at different resolutions. The dataset has been acquired using three thermal cameras at different resolutions, which acquire images from the same scenario at the same time. The thermal cameras are mounted in rig trying to minimize the baseline distance to make easier the registration problem.
The proposed architecture is based on ResNet6 as a Generator and PatchGAN as Discriminator. The novelty on the proposed unsupervised super-resolution training (CycleGAN) is possible due to the existence of aforementioned thermal images—images of the same scenario with different resolutions. The proposed approach is evaluated in the dataset and compared with classical bicubic interpolation. The dataset and the network are available. |
||||
Address | Valletta; Malta; February 2020 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISAPP | ||
Notes | MSIAU; 600.130; 600.122 | Approved | no | ||
Call Number | Admin @ si @ RSV2020 | Serial | 3432 | ||
Permanent link to this record | |||||
Author | Estefania Talavera; Carolin Wuerich; Nicolai Petkov; Petia Radeva | ||||
Title | Topic modelling for routine discovery from egocentric photo-streams | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 104 | Issue | Pages | 107330 | |
Keywords | Routine; Egocentric vision; Lifestyle; Behaviour analysis; Topic modelling | ||||
Abstract | Developing tools to understand and visualize lifestyle is of high interest when addressing the improvement of habits and well-being of people. Routine, defined as the usual things that a person does daily, helps describe the individuals’ lifestyle. With this paper, we are the first ones to address the development of novel tools for automatic discovery of routine days of an individual from his/her egocentric images. In the proposed model, sequences of images are firstly characterized by semantic labels detected by pre-trained CNNs. Then, these features are organized in temporal-semantic documents to later be embedded into a topic models space. Finally, Dynamic-Time-Warping and Spectral-Clustering methods are used for final day routine/non-routine discrimination. Moreover, we introduce a new EgoRoutine-dataset, a collection of 104 egocentric days with more than 100.000 images recorded by 7 users. Results show that routine can be discovered and behavioural patterns can be observed. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; no proj | Approved | no | ||
Call Number | Admin @ si @ TWP2020 | Serial | 3435 | ||
Permanent link to this record | |||||
Author | Debora Gil; Oriol Ramos Terrades; Raquel Perez | ||||
Title | Topological Radiomics (TOPiomics): Early Detection of Genetic Abnormalities in Cancer Treatment Evolution | Type | Conference Article | ||
Year | 2020 | Publication | Women in Geometry and Topology | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Barcelona; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; DAG; 600.139; 600.145; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GRP2020 | Serial | 3473 | ||
Permanent link to this record | |||||
Author | Zhengying Liu; Zhen Xu; Sergio Escalera; Isabelle Guyon; Julio C. S. Jacques Junior; Meysam Madadi; Adrien Pavao; Sebastien Treguer; Wei-Wei Tu | ||||
Title | Towards automated computer vision: analysis of the AutoCV challenges 2019 | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 135 | Issue | Pages | 196-203 | |
Keywords | Computer vision; AutoML; Deep learning | ||||
Abstract | We present the results of recent challenges in Automated Computer Vision (AutoCV, renamed here for clarity AutoCV1 and AutoCV2, 2019), which are part of a series of challenge on Automated Deep Learning (AutoDL). These two competitions aim at searching for fully automated solutions for classification tasks in computer vision, with an emphasis on any-time performance. The first competition was limited to image classification while the second one included both images and videos. Our design imposed to the participants to submit their code on a challenge platform for blind testing on five datasets, both for training and testing, without any human intervention whatsoever. Winning solutions adopted deep learning techniques based on already published architectures, such as AutoAugment, MobileNet and ResNet, to reach state-of-the-art performance in the time budget of the challenge (only 20 minutes of GPU time). The novel contributions include strategies to deliver good preliminary results at any time during the learning process, such that a method can be stopped early and still deliver good performance. This feature is key for the adoption of such techniques by data analysts desiring to obtain rapidly preliminary results on large datasets and to speed up the development process. The soundness of our design was verified in several aspects: (1) Little overfitting of the on-line leaderboard providing feedback on 5 development datasets was observed, compared to the final blind testing on the 5 (separate) final test datasets, suggesting that winning solutions might generalize to other computer vision classification tasks; (2) Error bars on the winners’ performance allow us to say with confident that they performed significantly better than the baseline solutions we provided; (3) The ranking of participants according to the any-time metric we designed, namely the Area under the Learning Curve, was different from that of the fixed-time metric, i.e. AUC at the end of the fixed time budget. We released all winning solutions under open-source licenses. At the end of the AutoDL challenge series, all data of the challenge will be made publicly available, thus providing a collection of uniformly formatted datasets, which can serve to conduct further research, particularly on meta-learning. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ LXE2020 | Serial | 3427 | ||
Permanent link to this record | |||||
Author | Zhengying Liu; Zhen Xu; Shangeth Rajaa; Meysam Madadi; Julio C. S. Jacques Junior; Sergio Escalera; Adrien Pavao; Sebastien Treguer; Wei-Wei Tu; Isabelle Guyon | ||||
Title | Towards Automated Deep Learning: Analysis of the AutoDL challenge series 2019 | Type | Conference Article | ||
Year | 2020 | Publication | Proceedings of Machine Learning Research | Abbreviated Journal | |
Volume | 123 | Issue | Pages | 242-252 | |
Keywords | |||||
Abstract | We present the design and results of recent competitions in Automated Deep Learning (AutoDL). In the AutoDL challenge series 2019, we organized 5 machine learning challenges: AutoCV, AutoCV2, AutoNLP, AutoSpeech and AutoDL. The first 4 challenges concern each a specific application domain, such as computer vision, natural language processing and speech recognition. At the time of March 2020, the last challenge AutoDL is still on-going and we only present its design. Some highlights of this work include: (1) a benchmark suite of baseline AutoML solutions, with emphasis on domains for which Deep Learning methods have had prior success (image, video, text, speech, etc); (2) a novel any-time learning framework, which opens doors for further theoretical consideration; (3) a repository of around 100 datasets (from all above domains) over half of which are released as public datasets to enable research on meta-learning; (4) analyses revealing that winning solutions generalize to new unseen datasets, validating progress towards universal AutoML solution; (5) open-sourcing of the challenge platform, the starting kit, the dataset formatting toolkit, and all winning solutions (All information available at {autodl.chalearn.org}). | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | NEURIPS | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ LXR2020 | Serial | 3500 | ||
Permanent link to this record | |||||
Author | Asma Bensalah; Jialuo Chen; Alicia Fornes; Cristina Carmona_Duarte; Josep Llados; Miguel A. Ferrer | ||||
Title | Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. | Type | Conference Article | ||
Year | 2020 | Publication | International Workshop on Artificial Intelligence for Healthcare Applications | Abbreviated Journal | |
Volume | 12661 | Issue | Pages | 476-489 | |
Keywords | |||||
Abstract | Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field. | ||||
Address | Virtual; January 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPRW | ||
Notes | DAG; 600.121; 600.140; | Approved | no | ||
Call Number | Admin @ si @ BCF2020 | Serial | 3508 | ||
Permanent link to this record | |||||
Author | Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca | ||||
Title | Transfer Learning from Synthetic Data in the Camera Pose Estimation Problem | Type | Conference Article | ||
Year | 2020 | Publication | 15th International Conference on Computer Vision Theory and Applications | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This paper presents a novel Siamese network architecture, as a variant of Resnet-50, to estimate the relative camera pose on multi-view environments. In order to improve the performance of the proposed model a transfer learning strategy, based on synthetic images obtained from a virtual-world, is considered. The transfer learning consists of first training the network using pairs of images from the virtual-world scenario
considering different conditions (i.e., weather, illumination, objects, buildings, etc.); then, the learned weight of the network are transferred to the real case, where images from real-world scenarios are considered. Experimental results and comparisons with the state of the art show both, improvements on the relative pose estimation accuracy using the proposed model, as well as further improvements when the transfer learning strategy (synthetic-world data transfer learning real-world data) is considered to tackle the limitation on the training due to the reduced number of pairs of real-images on most of the public data sets. |
||||
Address | Valletta; Malta; February 2020 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISAPP | ||
Notes | MSIAU; 600.130; 601.349; 600.122 | Approved | no | ||
Call Number | Admin @ si @ CSV2020 | Serial | 3433 | ||
Permanent link to this record | |||||
Author | Yaxing Wang | ||||
Title | Transferring and Learning Representations for Image Generation and Translation | Type | Book Whole | ||
Year | 2020 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation. | ||||
Address | January 2020 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Joost Van de Weijer;Abel Gonzalez;Luis Herranz | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-121011-5-7 | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; 600.141; 600.120 | Approved | no | ||
Call Number | Admin @ si @ Wan2020 | Serial | 3397 | ||
Permanent link to this record |