|   | 
Details
   web
Records
Author Fei Yang; Luis Herranz; Joost Van de Weijer; Jose Antonio Iglesias; Antonio Lopez; Mikhail Mozerov
Title Variable Rate Deep Image Compression with Modulated Autoencoder Type Journal Article
Year 2020 Publication IEEE Signal Processing Letters Abbreviated Journal SPL
Volume 27 Issue Pages 331-335
Keywords
Abstract Variable rate is a requirement for flexible and adaptable image and video compression. However, deep image compression methods (DIC) are optimized for a single fixed rate-distortion (R-D) tradeoff. While this can be addressed by training multiple models for different tradeoffs, the memory requirements increase proportionally to the number of models. Scaling the bottleneck representation of a shared autoencoder can provide variable rate compression with a single shared autoencoder. However, the R-D performance using this simple mechanism degrades in low bitrates, and also shrinks the effective range of bitrates. To address these limitations, we formulate the problem of variable R-D optimization for DIC, and propose modulated autoencoders (MAEs), where the representations of a shared autoencoder are adapted to the specific R-D tradeoff via a modulation network. Jointly training this modulated autoencoder and the modulation network provides an effective way to navigate the R-D operational curve. Our experiments show that the proposed method can achieve almost the same R-D performance of independent models with significantly fewer parameters.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; ADAS; 600.141; 600.120; 600.118 Approved no
Call Number Admin @ si @ YHW2020 Serial (down) 3346
Permanent link to this record
 

 
Author Fei Yang; Yongmei Cheng; Joost Van de Weijer; Mikhail Mozerov
Title Improved Discrete Optical Flow Estimation With Triple Image Matching Cost Type Journal Article
Year 2020 Publication IEEE Access Abbreviated Journal ACCESS
Volume 8 Issue Pages 17093 - 17102
Keywords
Abstract Approaches that use more than two consecutive video frames in the optical flow estimation have a long research history. However, almost all such methods utilize extra information for a pre-processing flow prediction or for a post-processing flow correction and filtering. In contrast, this paper differs from previously developed techniques. We propose a new algorithm for the likelihood function calculation (alternatively the matching cost volume) that is used in the maximum a posteriori estimation. We exploit the fact that in general, optical flow is locally constant in the sense of time and the likelihood function depends on both the previous and the future frame. Implementation of our idea increases the robustness of optical flow estimation. As a result, our method outperforms 9% over the DCFlow technique, which we use as prototype for our CNN based computation architecture, on the most challenging MPI-Sintel dataset for the non-occluded mask metric. Furthermore, our approach considerably increases the accuracy of the flow estimation for the matching cost processing, consequently outperforming the original DCFlow algorithm results up to 50% in occluded regions and up to 9% in non-occluded regions on the MPI-Sintel dataset. The experimental section shows that the proposed method achieves state-of-the-arts results especially on the MPI-Sintel dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ YCW2020 Serial (down) 3345
Permanent link to this record
 

 
Author Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas
Title Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features Type Conference Article
Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
Address Aspen; Colorado; USA; March 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ MDB2020 Serial (down) 3334
Permanent link to this record
 

 
Author Sergio Escalera; Ralf Herbrich
Title The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations Type Book Whole
Year 2020 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal
Volume Issue Pages
Keywords
Abstract This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
Address
Corporate Author Thesis
Publisher Place of Publication Editor Sergio Escalera; Ralf Hebrick
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 2520-1328 ISBN 978-3-030-29134-1 Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ HeE2020 Serial (down) 3328
Permanent link to this record
 

 
Author Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger
Title Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network Type Journal Article
Year 2020 Publication Automation in Construction Abbreviated Journal AC
Volume 110 Issue Pages 102973
Keywords Semantic image segmentation; Deep learning
Abstract Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ DMK2020 Serial (down) 3314
Permanent link to this record
 

 
Author Hassan Ahmed Sial; Ramon Baldrich; Maria Vanrell
Title Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects Type Journal Article
Year 2020 Publication Journal of the Optical Society of America A Abbreviated Journal JOSA A
Volume 37 Issue 1 Pages 1-15
Keywords
Abstract Estimation of intrinsic images still remains a challenging task due to weaknesses of ground-truth datasets, which either are too small or present non-realistic issues. On the other hand, end-to-end deep learning architectures start to achieve interesting results that we believe could be improved if important physical hints were not ignored. In this work, we present a twofold framework: (a) a flexible generation of images overcoming some classical dataset problems such as larger size jointly with coherent lighting appearance; and (b) a flexible architecture tying physical properties through intrinsic losses. Our proposal is versatile, presents low computation time, and achieves state-of-the-art results.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC; 600.140; 600.12; 600.118 Approved no
Call Number Admin @ si @ SBV2019 Serial (down) 3311
Permanent link to this record
 

 
Author Ivet Rafegas; Maria Vanrell; Luis A Alexandre; G. Arias
Title Understanding trained CNNs by indexing neuron selectivity Type Journal Article
Year 2020 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 136 Issue Pages 318-325
Keywords
Abstract The impressive performance of Convolutional Neural Networks (CNNs) when solving different vision problems is shadowed by their black-box nature and our consequent lack of understanding of the representations they build and how these representations are organized. To help understanding these issues, we propose to describe the activity of individual neurons by their Neuron Feature visualization and quantify their inherent selectivity with two specific properties. We explore selectivity indexes for: an image feature (color); and an image label (class membership). Our contribution is a framework to seek or classify neurons by indexing on these selectivity properties. It helps to find color selective neurons, such as a red-mushroom neuron in layer Conv4 or class selective neurons such as dog-face neurons in layer Conv5 in VGG-M, and establishes a methodology to derive other selectivity properties. Indexing on neuron selectivity can statistically draw how features and classes are represented through layers in a moment when the size of trained nets is growing and automatic tools to index neurons can be helpful.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC; 600.087; 600.140; 600.118 Approved no
Call Number Admin @ si @ RVL2019 Serial (down) 3310
Permanent link to this record
 

 
Author Akhil Gurram; Onay Urfalioglu; Ibrahim Halfaoui; Fahd Bouzaraa; Antonio Lopez
Title Semantic Monocular Depth Estimation Based on Artificial Intelligence Type Journal Article
Year 2020 Publication IEEE Intelligent Transportation Systems Magazine Abbreviated Journal ITSM
Volume 13 Issue 4 Pages 99-103
Keywords
Abstract Depth estimation provides essential information to perform autonomous driving and driver assistance. A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels where the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on monocular depth estimation.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.124; 600.118 Approved no
Call Number Admin @ si @ GUH2019 Serial (down) 3306
Permanent link to this record
 

 
Author Cesar de Souza; Adrien Gaidon; Yohann Cabon; Naila Murray; Antonio Lopez
Title Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models Type Journal Article
Year 2020 Publication International Journal of Computer Vision Abbreviated Journal IJCV
Volume 128 Issue Pages 1505–1536
Keywords Procedural generation; Human action recognition; Synthetic data; Physics
Abstract Deep video action recognition models have been highly successful in recent years but require large quantities of manually-annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation, physics models and other components of modern game engines. With this model we generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. PHAV contains a total of 39,982 videos, with more than 1000 examples for each of 35 action categories. Our video generation approach is not limited to existing motion capture sequences: 14 of these 35 categories are procedurally-defined synthetic actions. In addition, each video is represented with 6 different data modalities, including RGB, optical flow and pixel-level semantic labels. These modalities are generated almost simultaneously using the Multiple Render Targets feature of modern GPUs. In order to leverage PHAV, we introduce a deep multi-task (i.e. that considers action classes from multiple datasets) representation learning architecture that is able to simultaneously learn from synthetic and real video datasets, even when their action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance. Our approach also significantly outperforms video representations produced by fine-tuning state-of-the-art unsupervised generative models of videos.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.124; 600.118 Approved no
Call Number Admin @ si @ SGC2019 Serial (down) 3303
Permanent link to this record
 

 
Author Edgar Riba; D. Mishkin; Daniel Ponsa; E. Rublee; G. Bradski
Title Kornia: an Open Source Differentiable Computer Vision Library for PyTorch Type Conference Article
Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Aspen; Colorado; USA; March 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes MSIAU; 600.122; 600.130 Approved no
Call Number Admin @ si @ RMP2020 Serial (down) 3291
Permanent link to this record
 

 
Author Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
Title Exploring Hate Speech Detection in Multimodal Publications Type Conference Article
Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
Address Aspen; March 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ GGG2020a Serial (down) 3280
Permanent link to this record