toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author (up) Mohamed Ilyes Lakhal; Hakan Çevikalp; Sergio Escalera; Ferda Ofli edit  doi
openurl 
  Title Recurrent Neural Networks for Remote Sensing Image Classification Type Journal Article
  Year 2018 Publication IET Computer Vision Abbreviated Journal IETCV  
  Volume 12 Issue 7 Pages 1040 - 1045  
  Keywords  
  Abstract Automatically classifying an image has been a central problem in computer vision for decades. A plethora of models has been proposed, from handcrafted feature solutions to more sophisticated approaches such as deep learning. The authors address the problem of remote sensing image classification, which is an important problem to many real world applications. They introduce a novel deep recurrent architecture that incorporates high-level feature descriptors to tackle this challenging problem. Their solution is based on the general encoder–decoder framework. To the best of the authors’ knowledge, this is the first study to use a recurrent network structure on this task. The experimental results show that the proposed framework outperforms the previous works in the three datasets widely used in the literature. They have achieved a state-of-the-art accuracy rate of 97.29% on the UC Merced dataset.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ LÇE2018 Serial 3119  
Permanent link to this record
 

 
Author (up) Mohamed Ilyes Lakhal; Hakan Cevikalp; Sergio Escalera edit   pdf
doi  openurl
  Title CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification Type Conference Article
  Year 2018 Publication 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal  
  Volume 5 Issue Pages 137-144  
  Keywords Vehicle Classification; Deep Learning; End-to-end Learning  
  Abstract Vehicle type classification is considered to be a central part of Intelligent Traffic Systems. In the recent years, deep learning methods have emerged in as being the state-of-the-art in many computer vision tasks. In this paper, we present a novel yet simple deep learning framework for the vehicle type classification problem. We propose an end-to-end trainable system, that combines convolution neural network for feature extraction and recurrent neural network as a classifier. The recurrent network structure is used to handle various types of feature inputs, and at the same time allows to produce a single or a set of class predictions. In order to assess the effectiveness of our solution, we have conducted a set of experiments in two public datasets, obtaining state of the art results. In addition, we also report results on the newly released MIO-TCD dataset.  
  Address Funchal; Madeira; Portugal; January 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference VISAPP  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ LCE2018a Serial 3094  
Permanent link to this record
 

 
Author (up) Mohammad A. Haque; Ruben B. Bautista; Kamal Nasrollahi; Sergio Escalera; Christian B. Laursen; Ramin Irani; Ole K. Andersen; Erika G. Spaich; Kaustubh Kulkarni; Thomas B. Moeslund; Marco Bellantonio; Golamreza Anbarjafari; Fatemeh Noroozi edit   pdf
doi  openurl
  Title Deep Multimodal Pain Recognition: A Database and Comparision of Spatio-Temporal Visual Modalities, Faces and Gestures Type Conference Article
  Year 2018 Publication 13th IEEE Conference on Automatic Face and Gesture Recognition Abbreviated Journal  
  Volume Issue Pages 250 - 257  
  Keywords  
  Abstract Pain is a symptom of many disorders associated with actual or potential tissue damage in human body. Managing pain is not only a duty but also highly cost prone. The most primitive state of pain management is the assessment of pain. Traditionally it was accomplished by self-report or visual inspection by experts. However, automatic pain assessment systems from facial videos are also rapidly evolving due to the need of managing pain in a robust and cost effective way. Among different challenges of automatic pain assessment from facial video data two issues are increasingly prevalent: first, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data on shallow learning scenarios. However, employing deep learning techniques for spatio-temporal analysis considering Depth (D) and Thermal (T) along with RGB has high potential in this area. In this paper, we present the first state-of-the-art publicly available database, 'Multimodal Intensity Pain (MIntPAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate.  
  Address Xian; China; May 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference FG  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ HBN2018 Serial 3117  
Permanent link to this record
 

 
Author (up) Mohammad N. S. Jahromi; Morten Bojesen Bonderup; Maryam Asadi-Aghbolaghi; Egils Avots; Kamal Nasrollahi; Sergio Escalera; Shohreh Kasaei; Thomas B. Moeslund; Gholamreza Anbarjafari edit  doi
openurl 
  Title Automatic Access Control Based on Face and Hand Biometrics in a Non-cooperative Context Type Conference Article
  Year 2018 Publication IEEE Winter Applications of Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 28-36  
  Keywords IEEE Winter Applications of Computer Vision Workshops  
  Abstract Automatic access control systems (ACS) based on the human biometrics or physical tokens are widely employed in public and private areas. Yet these systems, in their conventional forms, are restricted to active interaction from the users. In scenarios where users are not cooperating with the system, these systems are challenged. Failure in cooperation with the biometric systems might be intentional or because the users are incapable of handling the interaction procedure with the biometric system or simply forget to cooperate with it, due to for example, illness like dementia. This work introduces a challenging bimodal database, including face and hand information of the users when they approach a door to open it by its handle in a noncooperative context. We have defined two (an easy and a challenging) protocols on how to use the database. We have reported results on many baseline methods, including deep learning techniques as well as conventional methods on the database. The obtained results show the merit of the proposed database and the challenging nature of access control with non-cooperative users.  
  Address Lake Tahoe; USA; March 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACVW  
  Notes HUPBA; 602.133 Approved no  
  Call Number Admin @ si @ JBA2018 Serial 3121  
Permanent link to this record
 

 
Author (up) Mohammed Al Rawi; Dimosthenis Karatzas edit   pdf
openurl 
  Title On the Labeling Correctness in Computer Vision Datasets Type Conference Article
  Year 2018 Publication Proceedings of the Workshop on Interactive Adaptive Learning, co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Image datasets have heavily been used to build computer vision systems.
These datasets are either manually or automatically labeled, which is a
problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect
labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECML-PKDDW  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ RaK2018 Serial 3144  
Permanent link to this record
 

 
Author (up) Muhammad Anwer Rao; Fahad Shahbaz Khan; Joost Van de Weijer; Matthieu Molinier; Jorma Laaksonen edit   pdf
url  openurl
  Title Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification Type Journal Article
  Year 2018 Publication ISPRS Journal of Photogrammetry and Remote Sensing Abbreviated Journal ISPRS J  
  Volume 138 Issue Pages 74-85  
  Keywords Remote sensing; Deep learning; Scene classification; Local Binary Patterns; Texture analysis  
  Abstract Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The de facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Local Binary Patterns (LBP) encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit LBP based texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Furthermore, our final combination leads to consistent improvement over the state-of-the-art for remote sensing scene  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.109; 600.106; 600.120 Approved no  
  Call Number Admin @ si @ RKW2018 Serial 3158  
Permanent link to this record
 

 
Author (up) Oscar Argudo; Marc Comino; Antonio Chica; Carlos Andujar; Felipe Lumbreras edit  url
openurl 
  Title Segmentation of aerial images for plausible detail synthesis Type Journal Article
  Year 2018 Publication Computers & Graphics Abbreviated Journal CG  
  Volume 71 Issue Pages 23-34  
  Keywords Terrain editing; Detail synthesis; Vegetation synthesis; Terrain rendering; Image segmentation  
  Abstract The visual enrichment of digital terrain models with plausible synthetic detail requires the segmentation of aerial images into a suitable collection of categories. In this paper we present a complete pipeline for segmenting high-resolution aerial images into a user-defined set of categories distinguishing e.g. terrain, sand, snow, water, and different types of vegetation. This segmentation-for-synthesis problem implies that per-pixel categories must be established according to the algorithms chosen for rendering the synthetic detail. This precludes the definition of a universal set of labels and hinders the construction of large training sets. Since artists might choose to add new categories on the fly, the whole pipeline must be robust against unbalanced datasets, and fast on both training and inference. Under these constraints, we analyze the contribution of common per-pixel descriptors, and compare the performance of state-of-the-art supervised learning algorithms. We report the findings of two user studies. The first one was conducted to analyze human accuracy when manually labeling aerial images. The second user study compares detailed terrains built using different segmentation strategies, including official land cover maps. These studies demonstrate that our approach can be used to turn digital elevation models into fully-featured, detailed terrains with minimal authoring efforts.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0097-8493 ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU; 600.086; 600.118 Approved no  
  Call Number Admin @ si @ ACC2018 Serial 3147  
Permanent link to this record
 

 
Author (up) Ozan Caglayan; Adrien Bardet; Fethi Bougares; Loic Barrault; Kai Wang; Marc Masana; Luis Herranz; Joost Van de Weijer edit   pdf
openurl 
  Title LIUM-CVC Submissions for WMT18 Multimodal Translation Task Type Conference Article
  Year 2018 Publication 3rd Conference on Machine Translation Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previou multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions
ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.
 
  Address Brussels; Belgium; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WMT  
  Notes LAMP; 600.106; 600.120 Approved no  
  Call Number Admin @ si @ CBB2018 Serial 3240  
Permanent link to this record
 

 
Author (up) Patricia Suarez; Angel Sappa; Boris X. Vintimilla edit   pdf
isbn  openurl
  Title Cross-spectral image dehaze through a dense stacked conditional GAN based approach Type Conference Article
  Year 2018 Publication 14th IEEE International Conference on Signal Image Technology & Internet Based System Abbreviated Journal  
  Volume Issue Pages  
  Keywords Infrared imaging; Dense; Stacked CGAN; Crossspectral; Convolutional networks  
  Abstract This paper proposes a novel approach to remove haze from RGB images using a near infrared images based on a dense stacked conditional Generative Adversarial Network (CGAN). The architecture of the deep network implemented
receives, besides the images with haze, its corresponding image in the near infrared spectrum, which serve to accelerate the learning process of the details of the characteristics of the images. The model uses a triplet layer that allows the independence learning of each channel of the visible spectrum image to remove the haze on each color channel separately. A multiple loss function scheme is proposed, which ensures balanced learning between the colors
and the structure of the images. Experimental results have shown that the proposed method effectively removes the haze from the images. Additionally, the proposed approach is compared with a state of the art approach showing better results.
 
  Address Las Palmas de Gran Canaria; November 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-5386-9385-8 Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018a Serial 3193  
Permanent link to this record
 

 
Author (up) Patricia Suarez; Angel Sappa; Boris X. Vintimilla edit   pdf
url  openurl
  Title Vegetation Index Estimation from Monospectral Images Type Conference Article
  Year 2018 Publication 15th International Conference on Images Analysis and Recognition Abbreviated Journal  
  Volume 10882 Issue Pages 353-362  
  Keywords  
  Abstract This paper proposes a novel approach to estimate Normalized Difference Vegetation Index (NDVI) from just the red channel of a RGB image. The NDVI index is defined as the ratio of the difference of the red and infrared radiances over their sum. In other words, information from the red channel of a RGB image and the corresponding infrared spectral band are required for its computation. In the current work the NDVI index is estimated just from the red channel by training a Conditional Generative Adversarial Network (CGAN). The architecture proposed for the generative network consists of a single level structure, which combines at the final layer results from convolutional operations together with the given red channel with Gaussian noise to enhance
details, resulting in a sharp NDVI image. Then, the discriminative model
estimates the probability that the NDVI generated index came from the training dataset, rather than the index automatically generated. Experimental results with a large set of real images are provided showing that a Conditional GAN single level model represents an acceptable approach to estimate NDVI index.
 
  Address Povoa de Varzim; Portugal; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICIAR  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018c Serial 3196  
Permanent link to this record
 

 
Author (up) Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title Near InfraRed Imagery Colorization Type Conference Article
  Year 2018 Publication 25th International Conference on Image Processing Abbreviated Journal  
  Volume Issue Pages 2237 - 2241  
  Keywords Convolutional Neural Networks (CNN), Generative Adversarial Network (GAN), Infrared Imagery colorization  
  Abstract This paper proposes a stacked conditional Generative Adversarial Network-based method for Near InfraRed (NIR) imagery colorization. We propose a variant architecture of Generative Adversarial Network (GAN) that uses multiple
loss functions over a conditional probabilistic generative model. We show that this new architecture/loss-function yields better generalization and representation of the generated colored IR images. The proposed approach is evaluated on a large test dataset and compared to recent state of the art methods using standard metrics.
 
  Address Athens; Greece; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICIP  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018b Serial 3195  
Permanent link to this record
 

 
Author (up) Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title Deep Learning based Single Image Dehazing Type Conference Article
  Year 2018 Publication 31st IEEE Conference on Computer Vision and Pattern Recognition Workhsop Abbreviated Journal  
  Volume Issue Pages 1250 - 12507  
  Keywords Gallium nitride; Atmospheric modeling; Generators; Generative adversarial networks; Convergence; Image color analysis  
  Abstract This paper proposes a novel approach to remove haze degradations in RGB images using a stacked conditional Generative Adversarial Network (GAN). It employs a triplet of GAN to remove the haze on each color channel independently.
A multiple loss functions scheme, applied over a conditional probabilistic model, is proposed. The proposed GAN architecture learns to remove the haze, using as conditioned entrance, the images with haze from which the clear
images will be obtained. Such formulation ensures a fast model training convergence and a homogeneous model generalization. Experiments showed that the proposed method generates high-quality clear images.
 
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018d Serial 3197  
Permanent link to this record
 

 
Author (up) Patrick Brandao; O. Zisimopoulos; E. Mazomenos; G. Ciutib; Jorge Bernal; M. Visentini-Scarzanell; A. Menciassi; P. Dario; A. Koulaouzidis; A. Arezzo; D.J. Hawkes; D. Stoyanov edit   pdf
url  doi
openurl 
  Title Towards a computed-aided diagnosis system in colonoscopy: Automatic polyp segmentation using convolution neural networks Type Journal
  Year 2018 Publication Journal of Medical Robotics Research Abbreviated Journal JMRR  
  Volume 3 Issue 2 Pages  
  Keywords convolutional neural networks; colonoscopy; computer aided diagnosis  
  Abstract Early diagnosis is essential for the successful treatment of bowel cancers including colorectal cancer (CRC) and capsule endoscopic imaging with robotic actuation can be a valuable diagnostic tool when combined with automated image analysis. We present a deep learning rooted detection and segmentation framework for recognizing lesions in colonoscopy and capsule endoscopy images. We restructure established convolution architectures, such as VGG and ResNets, by converting them into fully-connected convolution networks (FCNs), ne-tune them and study their capabilities for polyp segmentation and detection. We additionally use Shape-from-Shading (SfS) to recover depth and provide a richer representation of the tissue's structure in colonoscopy images. Depth is
incorporated into our network models as an additional input channel to the RGB information and we demonstrate that the resulting network yields improved performance. Our networks are tested on publicly available datasets and the most accurate segmentation model achieved a mean segmentation IU of 47.78% and 56.95% on the ETIS-Larib and CVC-Colon datasets, respectively. For polyp
detection, the top performing models we propose surpass the current state of the art with detection recalls superior to 90% for all datasets tested. To our knowledge, we present the rst work to use FCNs for polyp segmentation in addition to proposing a novel combination of SfS and RGB that boosts performance.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; no menciona Approved no  
  Call Number BZM2018 Serial 2976  
Permanent link to this record
 

 
Author (up) Pau Riba; Andreas Fischer; Josep Llados; Alicia Fornes edit   pdf
doi  openurl
  Title Learning Graph Distances with Message Passing Neural Networks Type Conference Article
  Year 2018 Publication 24th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 2239-2244  
  Keywords ★Best Paper Award★  
  Abstract Graph representations have been widely used in pattern recognition thanks to their powerful representation formalism and rich theoretical background. A number of error-tolerant graph matching algorithms such as graph edit distance have been proposed for computing a distance between two labelled graphs. However, they typically suffer from a high
computational complexity, which makes it difficult to apply
these matching algorithms in a real scenario. In this paper, we propose an efficient graph distance based on the emerging field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure and learns a metric with a siamese network approach. The performance of the proposed graph distance is validated in two application cases, graph classification and graph retrieval of handwritten words, and shows a promising performance when compared with
(approximate) graph edit distance benchmarks.
 
  Address Beijing; China; August 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; 600.097; 603.057; 601.302; 600.121 Approved no  
  Call Number Admin @ si @ RFL2018 Serial 3168  
Permanent link to this record
 

 
Author (up) Pau Rodriguez; Josep M. Gonfaus; Guillem Cucurull; Xavier Roca; Jordi Gonzalez edit   pdf
url  openurl
  Title Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11212 Issue Pages 357-372  
  Keywords Deep Learning; Convolutional Neural Networks; Attention  
  Abstract We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.  
  Address Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes ISE; 600.098; 602.121; 600.119 Approved no  
  Call Number Admin @ si @ RGC2018 Serial 3139  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: