toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Ana Garcia Rodriguez; Jorge Bernal; F. Javier Sanchez; Henry Cordova; Rodrigo Garces Duran; Cristina Rodriguez de Miguel; Gloria Fernandez Esparrach edit  url
doi  openurl
  Title Polyp fingerprint: automatic recognition of colorectal polyps’ unique features Type Journal Article
  Year 2020 Publication Surgical Endoscopy and other Interventional Techniques Abbreviated Journal SEND  
  Volume (up) 34 Issue 4 Pages 1887-1889  
  Keywords  
  Abstract BACKGROUND:
Content-based image retrieval (CBIR) is an application of machine learning used to retrieve images by similarity on the basis of features. Our objective was to develop a CBIR system that could identify images containing the same polyp ('polyp fingerprint').

METHODS:
A machine learning technique called Bag of Words was used to describe each endoscopic image containing a polyp in a unique way. The system was tested with 243 white light images belonging to 99 different polyps (for each polyp there were at least two images representing it in two different temporal moments). Images were acquired in routine colonoscopies at Hospital Clínic using high-definition Olympus endoscopes. The method provided for each image the closest match within the dataset.

RESULTS:
The system matched another image of the same polyp in 221/243 cases (91%). No differences were observed in the number of correct matches according to Paris classification (protruded: 90.7% vs. non-protruded: 91.3%) and size (< 10 mm: 91.6% vs. > 10 mm: 90%).

CONCLUSIONS:
A CBIR system can match accurately two images containing the same polyp, which could be a helpful aid for polyp image recognition.

KEYWORDS:
Artificial intelligence; Colorectal polyps; Content-based image retrieval
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; no menciona Approved no  
  Call Number Admin @ si @ Serial 3403  
Permanent link to this record
 

 
Author Hassan Ahmed Sial; Ramon Baldrich; Maria Vanrell edit   pdf
url  openurl
  Title Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects Type Journal Article
  Year 2020 Publication Journal of the Optical Society of America A Abbreviated Journal JOSA A  
  Volume (up) 37 Issue 1 Pages 1-15  
  Keywords  
  Abstract Estimation of intrinsic images still remains a challenging task due to weaknesses of ground-truth datasets, which either are too small or present non-realistic issues. On the other hand, end-to-end deep learning architectures start to achieve interesting results that we believe could be improved if important physical hints were not ignored. In this work, we present a twofold framework: (a) a flexible generation of images overcoming some classical dataset problems such as larger size jointly with coherent lighting appearance; and (b) a flexible architecture tying physical properties through intrinsic losses. Our proposal is versatile, presents low computation time, and achieves state-of-the-art results.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes CIC; 600.140; 600.12; 600.118 Approved no  
  Call Number Admin @ si @ SBV2019 Serial 3311  
Permanent link to this record
 

 
Author Cristina Sanchez Montes; Jorge Bernal; Ana Garcia Rodriguez; Henry Cordova; Gloria Fernandez Esparrach edit  url
openurl 
  Title Revisión de métodos computacionales de detección y clasificación de pólipos en imagen de colonoscopia Type Journal Article
  Year 2020 Publication Gastroenterología y Hepatología Abbreviated Journal GH  
  Volume (up) 43 Issue 4 Pages 222-232  
  Keywords  
  Abstract Computer-aided diagnosis (CAD) is a tool with great potential to help endoscopists in the tasks of detecting and histologically classifying colorectal polyps. In recent years, different technologies have been described and their potential utility has been increasingly evidenced, which has generated great expectations among scientific societies. However, most of these works are retrospective and use images of different quality and characteristics which are analysed off line. This review aims to familiarise gastroenterologists with computational methods and the particularities of endoscopic imaging, which have an impact on image processing analysis. Finally, the publicly available image databases, needed to compare and confirm the results obtained with different methods, are presented.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; Approved no  
  Call Number Admin @ si @ SBG2020 Serial 3404  
Permanent link to this record
 

 
Author Beata Megyesi; Bernhard Esslinger; Alicia Fornes; Nils Kopal; Benedek Lang; George Lasry; Karl de Leeuw; Eva Pettersson; Arno Wacker; Michelle Waldispuhl edit  url
openurl 
  Title Decryption of historical manuscripts: the DECRYPT project Type Journal Article
  Year 2020 Publication Cryptologia Abbreviated Journal CRYPT  
  Volume (up) 44 Issue 6 Pages 545-559  
  Keywords automatic decryption; cipher collection; historical cryptology; image transcription  
  Abstract Many historians and linguists are working individually and in an uncoordinated fashion on the identification and decryption of historical ciphers. This is a time-consuming process as they often work without access to automatic methods and processes that can accelerate the decipherment. At the same time, computer scientists and cryptologists are developing algorithms to decrypt various cipher types without having access to a large number of original ciphertexts. In this paper, we describe the DECRYPT project aiming at the creation of resources and tools for historical cryptology by bringing the expertise of various disciplines together for collecting data, exchanging methods for faster progress to transcribe, decrypt and contextualize historical encrypted manuscripts. We present our goals and work-in progress of a general approach for analyzing historical encrypted manuscripts using standardized methods and a new set of state-of-the-art tools. We release the data and tools as open-source hoping that all mentioned disciplines would benefit and contribute to the research infrastructure of historical cryptology.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ MEF2020 Serial 3347  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera edit  url
doi  openurl
  Title Video-based Isolated Hand Sign Language Recognition Using a Deep Cascaded Model Type Journal Article
  Year 2020 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume (up) 79 Issue Pages 22965–22987  
  Keywords  
  Abstract In this paper, we propose an efficient cascaded model for sign language recognition taking benefit from spatio-temporal hand-based information using deep learning approaches, especially Single Shot Detector (SSD), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM), from videos. Our simple yet efficient and accurate model includes two main parts: hand detection and sign recognition. Three types of spatial features, including hand features, Extra Spatial Hand Relation (ESHR) features, and Hand Pose (HP) features, have been fused in the model to feed to LSTM for temporal features extraction. We train SSD model for hand detection using some videos collected from five online sign dictionaries. Our model is evaluated on our proposed dataset (Rastgoo et al., Expert Syst Appl 150: 113336, 2020), including 10’000 sign videos for 100 Persian sign using 10 contributors in 10 different backgrounds, and isoGD dataset. Using the 5-fold cross-validation method, our model outperforms state-of-the-art alternatives in sign language recognition  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ RKE2020b Serial 3442  
Permanent link to this record
 

 
Author Rahma Kalboussi; Aymen Azaza; Joost Van de Weijer; Mehrez Abdellaoui; Ali Douik edit  url
openurl 
  Title Object proposals for salient object segmentation in videos Type Journal Article
  Year 2020 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume (up) 79 Issue 13 Pages 8677-8693  
  Keywords  
  Abstract Salient object segmentation in videos is generally broken up in a video segmentation part and a saliency assignment part. Recently, object proposals, which are used to segment the image, have had significant impact on many computer vision applications, including image segmentation, object detection, and recently saliency detection in still images. However, their usage has not yet been evaluated for salient object segmentation in videos. Therefore, in this paper, we investigate the application of object proposals to salient object segmentation in videos. In addition, we propose a new motion feature derived from the optical flow structure tensor for video saliency detection. Experiments on two standard benchmark datasets for video saliency show that the proposed motion feature improves saliency estimation results, and that object proposals are an efficient method for salient object segmentation. Results on the challenging SegTrack v2 and Fukuchi benchmark data sets show that we significantly outperform the state-of-the-art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number KAW2020 Serial 3504  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera edit  url
openurl 
  Title Hand pose aware multimodal isolated sign language recognition Type Journal Article
  Year 2020 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume (up) 80 Issue Pages 127–163  
  Keywords  
  Abstract Isolated hand sign language recognition from video is a challenging research area in computer vision. Some of the most important challenges in this area include dealing with hand occlusion, fast hand movement, illumination changes, or background complexity. While most of the state-of-the-art results in the field have been achieved using deep learning-based models, the previous challenges are not completely solved. In this paper, we propose a hand pose aware model for isolated hand sign language recognition using deep learning approaches from two input modalities, RGB and depth videos. Four spatial feature types: pixel-level, flow, deep hand, and hand pose features, fused from both visual modalities, are input to LSTM for temporal sign recognition. While we use Optical Flow (OF) for flow information in RGB video inputs, Scene Flow (SF) is used for depth video inputs. By including hand pose features, we show a consistent performance improvement of the sign language recognition model. To the best of our knowledge, this is the first time that this discriminant spatiotemporal features, benefiting from the hand pose estimation features and multi-modal inputs, are fused for isolated hand sign language recognition. We perform a step-by-step analysis of the impact in terms of recognition performance of the hand pose features, different combinations of the spatial features, and different recurrent models, especially LSTM and GRU. Results on four public datasets confirm that the proposed model outperforms the current state-of-the-art models on Montalbano II, MSR Daily Activity 3D, and CAD-60 datasets with a relative accuracy improvement of 1.64%, 6.5%, and 7.6%. Furthermore, our model obtains a competitive results on isoGD dataset with only 0.22% margin lower than the current state-of-the-art model.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ RKE2020 Serial 3524  
Permanent link to this record
 

 
Author Idoia Ruiz; Bogdan Raducanu; Rakesh Mehta; Jaume Amores edit   pdf
url  openurl
  Title Optimizing speed/accuracy trade-off for person re-identification via knowledge distillation Type Journal Article
  Year 2020 Publication Engineering Applications of Artificial Intelligence Abbreviated Journal EAAI  
  Volume (up) 87 Issue Pages 103309  
  Keywords Person re-identification; Network distillation; Image retrieval; Model compression; Surveillance  
  Abstract Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.109; 600.120 Approved no  
  Call Number Admin @ si @ RRM2020 Serial 3401  
Permanent link to this record
 

 
Author Estefania Talavera; Carolin Wuerich; Nicolai Petkov; Petia Radeva edit  url
doi  openurl
  Title Topic modelling for routine discovery from egocentric photo-streams Type Journal Article
  Year 2020 Publication Pattern Recognition Abbreviated Journal PR  
  Volume (up) 104 Issue Pages 107330  
  Keywords Routine; Egocentric vision; Lifestyle; Behaviour analysis; Topic modelling  
  Abstract Developing tools to understand and visualize lifestyle is of high interest when addressing the improvement of habits and well-being of people. Routine, defined as the usual things that a person does daily, helps describe the individuals’ lifestyle. With this paper, we are the first ones to address the development of novel tools for automatic discovery of routine days of an individual from his/her egocentric images. In the proposed model, sequences of images are firstly characterized by semantic labels detected by pre-trained CNNs. Then, these features are organized in temporal-semantic documents to later be embedded into a topic models space. Finally, Dynamic-Time-Warping and Spectral-Clustering methods are used for final day routine/non-routine discrimination. Moreover, we introduce a new EgoRoutine-dataset, a collection of 104 egocentric days with more than 100.000 images recorded by 7 users. Results show that routine can be discovered and behavioural patterns can be observed.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ TWP2020 Serial 3435  
Permanent link to this record
 

 
Author Meysam Madadi; Hugo Bertiche; Sergio Escalera edit   pdf
url  openurl
  Title SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery Type Journal Article
  Year 2020 Publication Pattern Recognition Abbreviated Journal PR  
  Volume (up) 106 Issue Pages 107472  
  Keywords Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass  
  Abstract In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no proj Approved no  
  Call Number Admin @ si @ MBE2020 Serial 3439  
Permanent link to this record
 

 
Author Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger edit  url
openurl 
  Title Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network Type Journal Article
  Year 2020 Publication Automation in Construction Abbreviated Journal AC  
  Volume (up) 110 Issue Pages 102973  
  Keywords Semantic image segmentation; Deep learning  
  Abstract Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no proj Approved no  
  Call Number Admin @ si @ DMK2020 Serial 3314  
Permanent link to this record
 

 
Author Zhengying Liu; Zhen Xu; Shangeth Rajaa; Meysam Madadi; Julio C. S. Jacques Junior; Sergio Escalera; Adrien Pavao; Sebastien Treguer; Wei-Wei Tu; Isabelle Guyon edit   pdf
url  openurl
  Title Towards Automated Deep Learning: Analysis of the AutoDL challenge series 2019 Type Conference Article
  Year 2020 Publication Proceedings of Machine Learning Research Abbreviated Journal  
  Volume (up) 123 Issue Pages 242-252  
  Keywords  
  Abstract We present the design and results of recent competitions in Automated Deep Learning (AutoDL). In the AutoDL challenge series 2019, we organized 5 machine learning challenges: AutoCV, AutoCV2, AutoNLP, AutoSpeech and AutoDL. The first 4 challenges concern each a specific application domain, such as computer vision, natural language processing and speech recognition. At the time of March 2020, the last challenge AutoDL is still on-going and we only present its design. Some highlights of this work include: (1) a benchmark suite of baseline AutoML solutions, with emphasis on domains for which Deep Learning methods have had prior success (image, video, text, speech, etc); (2) a novel any-time learning framework, which opens doors for further theoretical consideration; (3) a repository of around 100 datasets (from all above domains) over half of which are released as public datasets to enable research on meta-learning; (4) analyses revealing that winning solutions generalize to new unseen datasets, validating progress towards universal AutoML solution; (5) open-sourcing of the challenge platform, the starting kit, the dataset formatting toolkit, and all winning solutions (All information available at {autodl.chalearn.org}).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference NEURIPS  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ LXR2020 Serial 3500  
Permanent link to this record
 

 
Author Cesar de Souza; Adrien Gaidon; Yohann Cabon; Naila Murray; Antonio Lopez edit   pdf
doi  openurl
  Title Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models Type Journal Article
  Year 2020 Publication International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume (up) 128 Issue Pages 1505–1536  
  Keywords Procedural generation; Human action recognition; Synthetic data; Physics  
  Abstract Deep video action recognition models have been highly successful in recent years but require large quantities of manually-annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. PHAV contains a total of 39,982 videos, with more than 1000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally-defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.124; 600.118 Approved no  
  Call Number Admin @ si @ SGC2019 Serial 3303  
Permanent link to this record
 

 
Author Yunan Li; Jun Wan; Qiguang Miao; Sergio Escalera; Huijuan Fang; Huizhou Chen; Xiangda Qi; Guodong Guo edit  url
openurl 
  Title CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis Type Journal Article
  Year 2020 Publication International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume (up) 128 Issue Pages 2763–2780  
  Keywords  
  Abstract First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the ChaLearn First Impressions dataset, including multimodal data with video, audio, and text converted from the corresponding audio data, where each person is talking in front of a camera. In order to give a comprehensive prediction, we analyze the videos from both the entire scene (including the person’s motions and background) and the face of the person. Our CR-Net first performs personality trait classification and applies a regression later, which can obtain accurate predictions for both personality traits and interview recommendation. Furthermore, we present a new loss function called Bell Loss to address inaccurate predictions caused by the regression-to-the-mean problem. Extensive experiments on the First Impressions dataset show the effectiveness of our proposed network, outperforming the state-of-the-art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ LWM2020 Serial 3413  
Permanent link to this record
 

 
Author Yaxing Wang; Luis Herranz; Joost Van de Weijer edit   pdf
url  doi
openurl 
  Title Mix and match networks: multi-domain alignment for unpaired image-to-image translation Type Journal Article
  Year 2020 Publication International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume (up) 128 Issue Pages 2849–2872  
  Keywords  
  Abstract This paper addresses the problem of inferring unseen cross-modal image-to-image translations between multiple modalities. We assume that only some of the pairwise translations have been seen (i.e. trained) and infer the remaining unseen translations (where training pairs are not available). We propose mix and match networks, an approach where multiple encoders and decoders are aligned in such a way that the desired translation can be obtained by simply cascading the source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). The main challenge lies in the alignment of the latent representations at the bottlenecks of encoder-decoder pairs. We propose an architecture with several tools to encourage alignment, including autoencoders and robust side information and latent consistency losses. We show the benefits of our approach in terms of effectiveness and scalability compared with other pairwise image-to-image translation approaches. We also propose zero-pair cross-modal image translation, a challenging setting where the objective is inferring semantic segmentation from depth (and vice-versa) without explicit segmentation-depth pairs, and only from two (disjoint) segmentation-RGB and depth-RGB training sets. We observe that a certain part of the shared information between unseen modalities might not be reachable, so we further propose a variant that leverages pseudo-pairs which allows us to exploit this shared information between the unseen modalities  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.109; 600.106; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ WHW2020 Serial 3424  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: