toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Katerine Diaz; Jesus Martinez del Rincon; Aura Hernandez-Sabate; Debora Gil edit   pdf
doi  openurl
  Title (up) Continuous head pose estimation using manifold subspace embedding and multivariate regression Type Journal Article
  Year 2018 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 6 Issue Pages 18325 - 18334  
  Keywords Head Pose estimation; HOG features; Generalized Discriminative Common Vectors; B-splines; Multiple linear regression  
  Abstract In this paper, a continuous head pose estimation system is proposed to estimate yaw and pitch head angles from raw facial images. Our approach is based on manifold learningbased methods, due to their promising generalization properties shown for face modelling from images. The method combines histograms of oriented gradients, generalized discriminative common vectors and continuous local regression to achieve successful performance. Our proposal was tested on multiple standard face datasets, as well as in a realistic scenario. Results show a considerable performance improvement and a higher consistence of our model in comparison with other state-of-art methods, with angular errors varying between 9 and 17 degrees.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2169-3536 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ DMH2018b Serial 3091  
Permanent link to this record
 

 
Author Lei Kang; Juan Ignacio Toledo; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol edit   pdf
url  openurl
  Title (up) Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition Type Conference Article
  Year 2018 Publication 40th German Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 459-472  
  Keywords  
  Abstract This paper proposes Convolve, Attend and Spell, an attention based sequence-to-sequence model for handwritten word recognition. The proposed architecture has three main parts: an encoder, consisting of a CNN and a bi-directional GRU, an attention mechanism devoted to focus on the pertinent features and a decoder formed by a one-directional GRU, able to spell the corresponding word, character by character. Compared with the recent state-of-the-art, our model achieves competitive results on the IAM dataset without needing any pre-processing step, predefined lexicon nor language model. Code and additional results are available in https://github.com/omni-us/research-seq2seq-HTR.  
  Address Stuttgart; Germany; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference GCPR  
  Notes DAG; 600.097; 603.057; 302.065; 601.302; 600.084; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ KTR2018 Serial 3167  
Permanent link to this record
 

 
Author Mohamed Ilyes Lakhal; Hakan Cevikalp; Sergio Escalera edit   pdf
doi  openurl
  Title (up) CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification Type Conference Article
  Year 2018 Publication 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal  
  Volume 5 Issue Pages 137-144  
  Keywords Vehicle Classification; Deep Learning; End-to-end Learning  
  Abstract Vehicle type classification is considered to be a central part of Intelligent Traffic Systems. In the recent years, deep learning methods have emerged in as being the state-of-the-art in many computer vision tasks. In this paper, we present a novel yet simple deep learning framework for the vehicle type classification problem. We propose an end-to-end trainable system, that combines convolution neural network for feature extraction and recurrent neural network as a classifier. The recurrent network structure is used to handle various types of feature inputs, and at the same time allows to produce a single or a set of class predictions. In order to assess the effectiveness of our solution, we have conducted a set of experiments in two public datasets, obtaining state of the art results. In addition, we also report results on the newly released MIO-TCD dataset.  
  Address Funchal; Madeira; Portugal; January 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference VISAPP  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ LCE2018a Serial 3094  
Permanent link to this record
 

 
Author Bojana Gajic; Ramon Baldrich edit  doi
openurl 
  Title (up) Cross-domain fashion image retrieval Type Conference Article
  Year 2018 Publication CVPR 2018 Workshop on Women in Computer Vision (WiCV 2018, 4th Edition) Abbreviated Journal  
  Volume Issue Pages 19500-19502  
  Keywords  
  Abstract Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products
in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
 
  Address Salt Lake City, USA; 22 June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes CIC; 600.087 Approved no  
  Call Number Admin @ si @ Serial 3709  
Permanent link to this record
 

 
Author Hugo Prol; Vincent Dumoulin; Luis Herranz edit  openurl
  Title (up) Cross-Modulation Networks for Few-Shot Learning Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract A family of recent successful approaches to few-shot learning relies on learning an embedding space in which predictions are made by computing similarities between examples. This corresponds to combining information between support and query examples at a very late stage of the prediction pipeline. Inspired by this observation, we hypothesize that there may be benefits to combining the information at various levels of abstraction along the pipeline. We present an architecture called Cross-Modulation Networks which allows support and query examples to interact throughout the feature extraction process via a feature-wise modulation mechanism. We adapt the Matching Networks architecture to take advantage of these interactions and show encouraging initial results on miniImageNet in the 5-way, 1-shot setting, where we close the gap with state-of-the-art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ PDH2018 Serial 3248  
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla edit   pdf
isbn  openurl
  Title (up) Cross-spectral image dehaze through a dense stacked conditional GAN based approach Type Conference Article
  Year 2018 Publication 14th IEEE International Conference on Signal Image Technology & Internet Based System Abbreviated Journal  
  Volume Issue Pages  
  Keywords Infrared imaging; Dense; Stacked CGAN; Crossspectral; Convolutional networks  
  Abstract This paper proposes a novel approach to remove haze from RGB images using a near infrared images based on a dense stacked conditional Generative Adversarial Network (CGAN). The architecture of the deep network implemented
receives, besides the images with haze, its corresponding image in the near infrared spectrum, which serve to accelerate the learning process of the details of the characteristics of the images. The model uses a triplet layer that allows the independence learning of each channel of the visible spectrum image to remove the haze on each color channel separately. A multiple loss function scheme is proposed, which ensures balanced learning between the colors
and the structure of the images. Experimental results have shown that the proposed method effectively removes the haze from the images. Additionally, the proposed approach is compared with a state of the art approach showing better results.
 
  Address Las Palmas de Gran Canaria; November 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-5386-9385-8 Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018a Serial 3193  
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Mohammed Jabreel; Hatem A. Rashwan; Syeda Furruka Banu; Petia Radeva; Domenec Puig edit   pdf
doi  openurl
  Title (up) CuisineNet: Food Attributes Classification using Multi-scale Convolution Network Type Conference Article
  Year 2018 Publication 21st International Conference of the Catalan Association for Artificial Intelligence Abbreviated Journal  
  Volume Issue Pages 365-372  
  Keywords  
  Abstract Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.  
  Address Roses; catalonia; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CCIA  
  Notes MILAB; no menciona Approved no  
  Call Number Admin @ si @ SJR2018 Serial 3113  
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Mohammed Jabreel; Hatem A. Rashwan; Syeda Furruka Banu; Antonio Moreno; Petia Radeva; Domenec Puig edit  openurl
  Title (up) CuisineNet: Food Attributes Classification using Multi-scale Convolution Network. Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ KJR2018 Serial 3235  
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title (up) Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 97-102  
  Keywords Robust Reading; End-to-end Systems; CNN; Utility Meters  
  Abstract In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as
dirt, illumination highlights or blur.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GRK2018 Serial 3102  
Permanent link to this record
 

 
Author Antonio Lopez; David Vazquez; Gabriel Villalonga edit  url
openurl 
  Title (up) Data for Training Models, Domain Adaptation Type Book Chapter
  Year 2018 Publication Intelligent Vehicles. Enabling Technologies and Future Developments Abbreviated Journal  
  Volume Issue Pages 395–436  
  Keywords Driving simulator; hardware; software; interface; traffic simulation; macroscopic simulation; microscopic simulation; virtual data; training data  
  Abstract Simulation can enable several developments in the field of intelligent vehicles. This chapter is divided into three main subsections. The first one deals with driving simulators. The continuous improvement of hardware performance is a well-known fact that is allowing the development of more complex driving simulators. The immersion in the simulation scene is increased by high fidelity feedback to the driver. In the second subsection, traffic simulation is explained as well as how it can be used for intelligent transport systems. Finally, it is rather clear that sensor-based perception and action must be based on data-driven algorithms. Simulation could provide data to train and test algorithms that are afterwards implemented in vehicles. These tools are explained in the third subsection.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ LVV2018 Serial 3047  
Permanent link to this record
 

 
Author Guillem Cucurull; Pau Rodriguez; Vacit Oguz Yazici; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez edit  openurl
  Title (up) Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract arXiv:1802.06757
Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of 2.3 billion users about a vast spectrum of topics. To sense the whys of certain social user’s demands and cultural-driven interests, however, the knowledge embedded in the 1.8 billion pictures which are uploaded daily in public profiles has just started to be exploited since this process has been typically been text-based. Following this trend on visual-based social analysis, we present a novel methodology based on Deep Learning to build a combined image-and-text based personality trait model, trained with images posted together with words found highly correlated to specific personality traits. So the key contribution here is to explore whether OCEAN personality trait modeling can be addressed based on images, here called MindPics, appearing with certain tags with psychological insights. We found that there is a correlation between those posted images and their accompanying texts, which can be successfully modeled using deep neural networks for personality estimation. The experimental results are consistent with previous cyber-psychology results based on texts or images.
In addition, classification results on some traits show that some patterns emerge in the set of images corresponding to a specific text, in essence to those representing an abstract concept. These results open new avenues of research for further refining the proposed personality model under the supervision of psychology experts.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE; 600.098; 600.119 Approved no  
  Call Number Admin @ si @ CRY2018 Serial 3550  
Permanent link to this record
 

 
Author Jorge Charco; Boris X. Vintimilla; Angel Sappa edit   pdf
openurl 
  Title (up) Deep learning based camera pose estimation in multi-view environment Type Conference Article
  Year 2018 Publication 14th IEEE International Conference on Signal Image Technology & Internet Based System Abbreviated Journal  
  Volume Issue Pages  
  Keywords Deep learning; Camera pose estimation; Multiview environment; Siamese architecture  
  Abstract This paper proposes to use a deep learning network architecture for relative camera pose estimation on a multi-view environment. The proposed network is a variant architecture of AlexNet to use as regressor for prediction the relative translation and rotation as output. The proposed approach is trained from
scratch on a large data set that takes as input a pair of imagesfrom the same scene. This new architecture is compared with a previous approach using standard metrics, obtaining better results on the relative camera pose.
 
  Address Las Palmas de Gran Canaria; November 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ CVS2018 Serial 3194  
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title (up) Deep Learning based Single Image Dehazing Type Conference Article
  Year 2018 Publication 31st IEEE Conference on Computer Vision and Pattern Recognition Workhsop Abbreviated Journal  
  Volume Issue Pages 1250 - 12507  
  Keywords Gallium nitride; Atmospheric modeling; Generators; Generative adversarial networks; Convergence; Image color analysis  
  Abstract This paper proposes a novel approach to remove haze degradations in RGB images using a stacked conditional Generative Adversarial Network (GAN). It employs a triplet of GAN to remove the haze on each color channel independently.
A multiple loss functions scheme, applied over a conditional probabilistic model, is proposed. The proposed GAN architecture learns to remove the haze, using as conditioned entrance, the images with haze from which the clear
images will be obtained. Such formulation ensures a fast model training convergence and a homogeneous model generalization. Experiments showed that the proposed method generates high-quality clear images.
 
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018d Serial 3197  
Permanent link to this record
 

 
Author Laura Lopez-Fuentes; Alessandro Farasin; Harald Skinnemoen; Paolo Garza edit   pdf
openurl 
  Title (up) Deep Learning models for passability detection of flooded roads Type Conference Article
  Year 2018 Publication MediaEval 2018 Multimedia Benchmark Workshop Abbreviated Journal  
  Volume 2283 Issue Pages  
  Keywords  
  Abstract In this paper we study and compare several approaches to detect floods and evidence for passability of roads by conventional means in Twitter. We focus on tweets containing both visual information (a picture shared by the user) and metadata, a combination of text and related extra information intrinsic to the Twitter API. This work has been done in the context of the MediaEval 2018 Multimedia Satellite Task.  
  Address Sophia Antipolis; France; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference MediaEval  
  Notes LAMP; 600.084; 600.109; 600.120 Approved no  
  Call Number Admin @ si @ LFS2018 Serial 3224  
Permanent link to this record
 

 
Author Mohammad A. Haque; Ruben B. Bautista; Kamal Nasrollahi; Sergio Escalera; Christian B. Laursen; Ramin Irani; Ole K. Andersen; Erika G. Spaich; Kaustubh Kulkarni; Thomas B. Moeslund; Marco Bellantonio; Golamreza Anbarjafari; Fatemeh Noroozi edit   pdf
doi  openurl
  Title (up) Deep Multimodal Pain Recognition: A Database and Comparision of Spatio-Temporal Visual Modalities, Faces and Gestures Type Conference Article
  Year 2018 Publication 13th IEEE Conference on Automatic Face and Gesture Recognition Abbreviated Journal  
  Volume Issue Pages 250 - 257  
  Keywords  
  Abstract Pain is a symptom of many disorders associated with actual or potential tissue damage in human body. Managing pain is not only a duty but also highly cost prone. The most primitive state of pain management is the assessment of pain. Traditionally it was accomplished by self-report or visual inspection by experts. However, automatic pain assessment systems from facial videos are also rapidly evolving due to the need of managing pain in a robust and cost effective way. Among different challenges of automatic pain assessment from facial video data two issues are increasingly prevalent: first, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data on shallow learning scenarios. However, employing deep learning techniques for spatio-temporal analysis considering Depth (D) and Thermal (T) along with RGB has high potential in this area. In this paper, we present the first state-of-the-art publicly available database, 'Multimodal Intensity Pain (MIntPAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate.  
  Address Xian; China; May 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference FG  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ HBN2018 Serial 3117  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: