Records |
Author |
Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar |
Title |
Self-Supervised Visual Representations for Cross-Modal Retrieval |
Type |
Conference Article |
Year |
2019 |
Publication |
ACM International Conference on Multimedia Retrieval |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
182–186 |
Keywords |
|
Abstract |
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset. |
Address |
Otawa; Canada; june 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICMR |
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ PGR2019 |
Serial |
3288 |
Permanent link to this record |
|
|
|
Author |
Ali Furkan Biten; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas |
Title |
Good News, Everyone! Context driven entity-aware captioning for news images |
Type |
Conference Article |
Year |
2019 |
Publication |
32nd IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
12458-12467 |
Keywords |
|
Abstract |
Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the world. In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. For this we focus on the captioning of images used to illustrate news articles. We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image. Our model is able to selectively draw information from the article guided by visual cues, and to dynamically extend the output dictionary to out-of-vocabulary named entities that appear in the context source. Furthermore we introduce“ GoodNews”, the largest news image captioning dataset in the literature and demonstrate state-of-the-art results. |
Address |
Long beach; California; USA; june 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPR |
Notes |
DAG; 600.129; 600.135; 601.338; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ BGR2019 |
Serial |
3289 |
Permanent link to this record |
|
|
|
Author |
Axel Barroso-Laguna; Edgar Riba; Daniel Ponsa; Krystian Mikolajczyk |
Title |
Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters |
Type |
Conference Article |
Year |
2019 |
Publication |
18th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
5835-5843 |
Keywords |
|
Abstract |
We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture. Handcrafted filters provide anchor structures for learned filters, which localize, score and rank repeatable features. Scale-space representation is used within the network to extract keypoints at different levels. We design a loss function to detect robust features that exist across a range of scales and to maximize the repeatability score. Our Key.Net model is trained on data synthetically created from ImageNet and evaluated on HPatches benchmark. Results show that our approach outperforms state-of-the-art detectors in terms of repeatability, matching performance and complexity. |
Address |
Seul; Corea; October 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICCV |
Notes |
MSIAU; 600.122 |
Approved |
no |
Call Number |
Admin @ si @ BRP2019 |
Serial |
3290 |
Permanent link to this record |
|
|
|
Author |
Edgar Riba; D. Mishkin; Daniel Ponsa; E. Rublee; G. Bradski |
Title |
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch |
Type |
Conference Article |
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
|
Address |
Aspen; Colorado; USA; March 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
MSIAU; 600.122; 600.130 |
Approved |
no |
Call Number |
Admin @ si @ RMP2020 |
Serial |
3291 |
Permanent link to this record |
|
|
|
Author |
Javad Zolfaghari Bengar; Abel Gonzalez-Garcia; Gabriel Villalonga; Bogdan Raducanu; Hamed H. Aghdam; Mikhail Mozerov; Antonio Lopez; Joost Van de Weijer |
Title |
Temporal Coherence for Active Learning in Videos |
Type |
Conference Article |
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
914-923 |
Keywords |
|
Abstract |
Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets. |
Address |
Seul; Corea; October 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICCVW |
Notes |
LAMP; ADAS; 600.124; 602.200; 600.118; 600.120; 600.141 |
Approved |
no |
Call Number |
Admin @ si @ ZGV2019 |
Serial |
3294 |
Permanent link to this record |
|
|
|
Author |
Stefan Lonn; Petia Radeva; Mariella Dimiccoli |
Title |
Smartphone picture organization: A hierarchical approach |
Type |
Journal Article |
Year |
2019 |
Publication |
Computer Vision and Image Understanding |
Abbreviated Journal |
CVIU |
Volume |
187 |
Issue |
|
Pages |
102789 |
Keywords |
|
Abstract |
We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ LRD2019 |
Serial |
3297 |
Permanent link to this record |
|
|
|
Author |
Eduardo Aguilar; Marc Bolaños; Petia Radeva |
Title |
Regularized uncertainty-based multi-task learning model for food analysis |
Type |
Journal Article |
Year |
2019 |
Publication |
Journal of Visual Communication and Image Representation |
Abbreviated Journal |
JVCIR |
Volume |
60 |
Issue |
|
Pages |
360-370 |
Keywords |
Multi-task models; Uncertainty modeling; Convolutional neural networks; Food image analysis; Food recognition; Food group recognition; Ingredients recognition; Cuisine recognition |
Abstract |
Food plays an important role in several aspects of our daily life. Several computer vision approaches have been proposed for tackling food analysis problems, but very little effort has been done in developing methodologies that could take profit of the existent correlation between tasks. In this paper, we propose a new multi-task model that is able to simultaneously predict different food-related tasks, e.g. dish, cuisine and food categories. Here, we extend the homoscedastic uncertainty modeling to allow single-label and multi-label classification and propose a regularization term, which jointly weighs the tasks as well as their correlations. Furthermore, we propose a new Multi-Attribute Food dataset and a new metric, Multi-Task Accuracy. We prove that using both our uncertainty-based loss and the class regularization term, we are able to improve the coherence of outputs between different tasks. Moreover, we outperform the use of task-specific models on classical measures like accuracy or . |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ ABR2019 |
Serial |
3298 |
Permanent link to this record |
|
|
|
Author |
Corina Krauter; Ursula Reiter; Albrecht Schmidt; Marc Masana; Rudolf Stollberger; Michael Fuchsjager; Gert Reiter |
Title |
Objective extraction of the temporal evolution of the mitral valve vortex ring from 4D flow MRI |
Type |
Conference Article |
Year |
2019 |
Publication |
27th Annual Meeting & Exhibition of the International Society for Magnetic Resonance in Medicine |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
The mitral valve vortex ring is a promising flow structure for analysis of diastolic function, however, methods for objective extraction of its formation to dissolution are lacking. We present a novel algorithm for objective extraction of the temporal evolution of the mitral valve vortex ring from magnetic resonance 4D flow data and validated the method against visual analysis. The algorithm successfully extracted mitral valve vortex rings during both early- and late-diastolic filling and agreed substantially with visual assessment. Early-diastolic mitral valve vortex ring properties differed between healthy subjects and patients with ischemic heart disease. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ISMRM |
Notes |
LAMP; 600.120 |
Approved |
no |
Call Number |
Admin @ si @ KRS2019 |
Serial |
3300 |
Permanent link to this record |
|
|
|
Author |
Aitor Alvarez-Gila; Adrian Galdran; Estibaliz Garrote; Joost Van de Weijer |
Title |
Self-supervised blur detection from synthetically blurred scenes |
Type |
Journal Article |
Year |
2019 |
Publication |
Image and Vision Computing |
Abbreviated Journal |
IMAVIS |
Volume |
92 |
Issue |
|
Pages |
103804 |
Keywords |
|
Abstract |
Blur detection aims at segmenting the blurred areas of a given image. Recent deep learning-based methods approach this problem by learning an end-to-end mapping between the blurred input and a binary mask representing the localization of its blurred areas. Nevertheless, the effectiveness of such deep models is limited due to the scarcity of datasets annotated in terms of blur segmentation, as blur annotation is labor intensive. In this work, we bypass the need for such annotated datasets for end-to-end learning, and instead rely on object proposals and a model for blur generation in order to produce a dataset of synthetically blurred images. This allows us to perform self-supervised learning over the generated image and ground truth blur mask pairs using CNNs, defining a framework that can be employed in purely self-supervised, weakly supervised or semi-supervised configurations. Interestingly, experimental results of such setups over the largest blur segmentation datasets available show that this approach achieves state of the art results in blur segmentation, even without ever observing any real blurred image. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.109; 600.120 |
Approved |
no |
Call Number |
Admin @ si @ AGG2019 |
Serial |
3301 |
Permanent link to this record |
|
|
|
Author |
Jiaolong Xu; Liang Xiao; Antonio Lopez |
Title |
Self-supervised Domain Adaptation for Computer Vision Tasks |
Type |
Journal Article |
Year |
2019 |
Publication |
IEEE Access |
Abbreviated Journal |
ACCESS |
Volume |
7 |
Issue |
|
Pages |
156694 - 156706 |
Keywords |
|
Abstract |
Recent progress of self-supervised visual representation learning has achieved remarkable success on many challenging computer vision benchmarks. However, whether these techniques can be used for domain adaptation has not been explored. In this work, we propose a generic method for self-supervised domain adaptation, using object recognition and semantic segmentation of urban scenes as use cases. Focusing on simple pretext/auxiliary tasks (e.g. image rotation prediction), we assess different learning strategies to improve domain adaptation effectiveness by self-supervision. Additionally, we propose two complementary strategies to further boost the domain adaptation accuracy on semantic segmentation within our method, consisting of prediction layer alignment and batch normalization calibration. The experimental results show adaptation levels comparable to most studied domain adaptation methods, thus, bringing self-supervision as a new alternative for reaching domain adaptation. The code is available at this link. https://github.com/Jiaolong/self-supervised-da. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ XXL2019 |
Serial |
3302 |
Permanent link to this record |
|
|
|
Author |
Cesar de Souza; Adrien Gaidon; Yohann Cabon; Naila Murray; Antonio Lopez |
Title |
Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models |
Type |
Journal Article |
Year |
2020 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
128 |
Issue |
|
Pages |
1505–1536 |
Keywords |
Procedural generation; Human action recognition; Synthetic data; Physics |
Abstract |
Deep video action recognition models have been highly successful in recent years but require large quantities of manually-annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. PHAV contains a total of 39,982 videos, with more than 1000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally-defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.124; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ SGC2019 |
Serial |
3303 |
Permanent link to this record |
|
|
|
Author |
Daniel Hernandez; Lukas Schneider; P. Cebrian; A. Espinosa; David Vazquez; Antonio Lopez; Uwe Franke; Marc Pollefeys; Juan Carlos Moure |
Title |
Slanted Stixels: A way to represent steep streets |
Type |
Journal Article |
Year |
2019 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
127 |
Issue |
|
Pages |
1643–1658 |
Keywords |
|
Abstract |
This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a fully convolutional network, which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.118; 600.124 |
Approved |
no |
Call Number |
Admin @ si @ HSC2019 |
Serial |
3304 |
Permanent link to this record |
|
|
|
Author |
Zhijie Fang; Antonio Lopez |
Title |
Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation |
Type |
Journal Article |
Year |
2019 |
Publication |
IEEE Transactions on Intelligent Transportation Systems |
Abbreviated Journal |
TITS |
Volume |
21 |
Issue |
11 |
Pages |
4773 - 4783 |
Keywords |
|
Abstract |
Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists is critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, thus, should be taken into account by systems providing any level of driving assistance, from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this paper, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow traffic rules to indicate future maneuvers with arm signals. In the case of pedestrians, no indications can be assumed. Instead, we hypothesize that the walking pattern of a pedestrian allows to determine if he/she has the intention of crossing the road in the path of the ego-vehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this paper, we show how the same methodology can be used for recognizing pedestrians and cyclists' intentions. For pedestrians, we perform experiments on the JAAD dataset. For cyclists, we did not found an analogous dataset, thus, we created our own one by acquiring and annotating videos which we share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ FaL2019 |
Serial |
3305 |
Permanent link to this record |
|
|
|
Author |
Akhil Gurram; Onay Urfalioglu; Ibrahim Halfaoui; Fahd Bouzaraa; Antonio Lopez |
Title |
Semantic Monocular Depth Estimation Based on Artificial Intelligence |
Type |
Journal Article |
Year |
2020 |
Publication |
IEEE Intelligent Transportation Systems Magazine |
Abbreviated Journal |
ITSM |
Volume |
13 |
Issue |
4 |
Pages |
99-103 |
Keywords |
|
Abstract |
Depth estimation provides essential information to perform autonomous driving and driver assistance. A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels where the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on monocular depth estimation. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.124; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ GUH2019 |
Serial |
3306 |
Permanent link to this record |
|
|
|
Author |
Debora Gil; Antonio Esteban Lansaque; Agnes Borras; Carles Sanchez |
Title |
Enhancing virtual bronchoscopy with intra-operative data using a multi-objective GAN |
Type |
Journal Article |
Year |
2019 |
Publication |
International Journal of Computer Assisted Radiology and Surgery |
Abbreviated Journal |
IJCAR |
Volume |
7 |
Issue |
1 |
Pages |
|
Keywords |
|
Abstract |
This manuscript has been withdrawn by bioRxiv due to upload of an incorrect version of the manuscript by the authors. Therefore, this manuscript should not be cited as reference for this project. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
IAM; 600.139; 600.145 |
Approved |
no |
Call Number |
Admin @ si @ GEB2019 |
Serial |
3307 |
Permanent link to this record |