|   | 
Details
   web
Records
Author Cesar de Souza; Adrien Gaidon; Eleonora Vig; Antonio Lopez
Title Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition Type Conference Article
Year 2016 Publication 14th European Conference on Computer Vision Abbreviated Journal (down)
Volume Issue Pages 697-716
Keywords
Abstract Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos.
Address Amsterdam; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCV
Notes ADAS; 600.076; 600.085 Approved no
Call Number Admin @ si @ SGV2016 Serial 2824
Permanent link to this record
 

 
Author Hugo Jair Escalante; Victor Ponce; Jun Wan; Michael A. Riegler; Baiyu Chen; Albert Clapes; Sergio Escalera; Isabelle Guyon; Xavier Baro; Pal Halvorsen; Henning Muller; Martha Larson
Title ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An Overview Type Conference Article
Year 2016 Publication 23rd International Conference on Pattern Recognition Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract This paper provides an overview of the Joint Contest on Multimedia Challenges Beyond Visual Analysis. We organized an academic competition that focused on four problems that require effective processing of multimodal information in order to be solved. Two tracks were devoted to gesture spotting and recognition from RGB-D video, two fundamental problems for human computer interaction. Another track was devoted to a second round of the first impressions challenge of which the goal was to develop methods to recognize personality traits from
short video clips. For this second round we adopted a novel collaborative-competitive (i.e., coopetition) setting. The fourth track was dedicated to the problem of video recommendation for improving user experience. The challenge was open for about 45 days, and received outstanding participation: almost
200 participants registered to the contest, and 20 teams sent predictions in the final stage. The main goals of the challenge were fulfilled: the state of the art was advanced considerably in the four tracks, with novel solutions to the proposed problems (mostly relying on deep learning). However, further research is still required. The data of the four tracks will be available to
allow researchers to keep making progress in the four tracks.
Address Cancun; Mexico; December 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes HuPBA; 602.143;MV Approved no
Call Number Admin @ si @ EPW2016 Serial 2827
Permanent link to this record
 

 
Author Baiyu Chen; Sergio Escalera; Isabelle Guyon; Victor Ponce; N. Shah; Marc Oliu
Title Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits Type Conference Article
Year 2016 Publication 14th European Conference on Computer Vision Workshops Abbreviated Journal (down)
Volume Issue Pages
Keywords Calibration of labels; Label bias; Ordinal labeling; Variance Models; Bradley-Terry-Luce model; Continuous labels; Regression; Personality traits; Crowd-sourced labels
Abstract We address the problem of calibration of workers whose task is to label patterns with continuous variables, which arises for instance in labeling images of videos of humans with continuous traits. Worker bias is particularly dicult to evaluate and correct when many workers contribute just a few labels, a situation arising typically when labeling is crowd-sourced. In the scenario of labeling short videos of people facing a camera with personality traits, we evaluate the feasibility of the pairwise ranking method to alleviate bias problems. Workers are exposed to pairs of videos at a time and must order by preference. The variable levels are reconstructed by fitting a Bradley-Terry-Luce model with maximum likelihood. This method may at first sight, seem prohibitively expensive because for N videos, p = N (N-1)/2 pairs must be potentially processed by workers rather that N videos. However, by performing extensive simulations, we determine an empirical law for the scaling of the number of pairs needed as a function of the number of videos in order to achieve a given accuracy of score reconstruction and show that the pairwise method is a ordable. We apply the method to the labeling of a large scale dataset of 10,000 videos used in the ChaLearn Apparent Personality Trait challenge.
Address Amsterdam; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCVW
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ CEG2016 Serial 2829
Permanent link to this record
 

 
Author Petia Radeva
Title Can Deep Learning and Egocentric Vision for Visual Lifelogging Help Us Eat Better? Type Conference Article
Year 2016 Publication 19th International Conference of the Catalan Association for Artificial Intelligence Abbreviated Journal (down)
Volume 4 Issue Pages
Keywords
Abstract
Address Barcelona; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CCIA
Notes MILAB Approved no
Call Number Admin @ si @ Rad2016 Serial 2832
Permanent link to this record
 

 
Author Alvaro Peris; Marc Bolaños; Petia Radeva; Francisco Casacuberta
Title Video Description Using Bidirectional Recurrent Neural Networks Type Conference Article
Year 2016 Publication 25th International Conference on Artificial Neural Networks Abbreviated Journal (down)
Volume 2 Issue Pages 3-11
Keywords Video description; Neural Machine Translation; Birectional Recurrent Neural Networks; LSTM; Convolutional Neural Networks
Abstract Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames.
Address Barcelona; September 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICANN
Notes MILAB; Approved no
Call Number Admin @ si @ PBR2016 Serial 2833
Permanent link to this record
 

 
Author Marc Bolaños; Petia Radeva
Title Simultaneous Food Localization and Recognition Type Conference Article
Year 2016 Publication 23rd International Conference on Pattern Recognition Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract CoRR abs/1604.07953
The development of automatic nutrition diaries, which would allow to keep track objectively of everything we eat, could enable a whole new world of possibilities for people concerned about their nutrition patterns. With this purpose, in this paper we propose the first method for simultaneous food localization and recognition. Our method is based on two main steps, which consist in, first, produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, second, recognize each of the food types or food-related objects present in each bounding box. We demonstrate that our proposal, compared to the most similar problem nowadays – object localization, is able to obtain high precision and reasonable recall levels with only a few bounding boxes. Furthermore, we show that it is applicable to both conventional and egocentric images.
Address Cancun; Mexico; December 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes MILAB; no proj Approved no
Call Number Admin @ si @ BoR2016 Serial 2834
Permanent link to this record
 

 
Author Maedeh Aghaei; Mariella Dimiccoli; Petia Radeva
Title With Whom Do I Interact? Detecting Social Interactions in Egocentric Photo-streams Type Conference Article
Year 2016 Publication 23rd International Conference on Pattern Recognition Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.
Address Cancun; Mexico; December 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes MILAB Approved no
Call Number Admin @ si @ ADR2016d Serial 2835
Permanent link to this record
 

 
Author Pedro Herruzo; Marc Bolaños; Petia Radeva
Title Can a CNN Recognize Catalan Diet? Type Book Chapter
Year 2016 Publication AIP Conference Proceedings Abbreviated Journal (down)
Volume 1773 Issue Pages
Keywords
Abstract CoRR abs/1607.08811
Nowadays, we can find several diseases related to the unhealthy diet habits of the population, such as diabetes, obesity, anemia, bulimia and anorexia. In many cases, these diseases are related to the food consumption of people. Mediterranean diet is scientifically known as a healthy diet that helps to prevent many metabolic diseases. In particular, our work focuses on the recognition of Mediterranean food and dishes. The development of this methodology would allow to analise the daily habits of users with wearable cameras, within the topic of lifelogging. By using automatic mechanisms we could build an objective tool for the analysis of the patient’s behavior, allowing specialists to discover unhealthy food patterns and understand the user’s lifestyle.
With the aim to automatically recognize a complete diet, we introduce a challenging multi-labeled dataset related to Mediter-ranean diet called FoodCAT. The first type of label provided consists of 115 food classes with an average of 400 images per dish, and the second one consists of 12 food categories with an average of 3800 pictures per class. This dataset will serve as a basis for the development of automatic diet recognition. In this context, deep learning and more specifically, Convolutional Neural Networks (CNNs), currently are state-of-the-art methods for automatic food recognition. In our work, we compare several architectures for image classification, with the purpose of diet recognition. Applying the best model for recognising food categories, we achieve a top-1 accuracy of 72.29%, and top-5 of 97.07%. In a complete diet recognition of dishes from Mediterranean diet, enlarged with the Food-101 dataset for international dishes recognition, we achieve a top-1 accuracy of 68.07%, and top-5 of 89.53%, for a total of 115+101 food classes.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ HBR2016 Serial 2837
Permanent link to this record
 

 
Author Fatemeh Noroozi; Marina Marjanovic; Angelina Njegus; Sergio Escalera; Gholamreza Anbarjafari
Title Fusion of Classifier Predictions for Audio-Visual Emotion Recognition Type Conference Article
Year 2016 Publication 23rd International Conference on Pattern Recognition Workshops Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract In this paper is presented a novel multimodal emotion recognition system which is based on the analysis of audio and visual cues. MFCC-based features are extracted from the audio channel and facial landmark geometric relations are
computed from visual data. Both sets of features are learnt separately using state-of-the-art classifiers. In addition, we summarise each emotion video into a reduced set of key-frames, which are learnt in order to visually discriminate emotions by means of a Convolutional Neural Network. Finally, confidence
outputs of all classifiers from all modalities are used to define a new feature space to be learnt for final emotion prediction, in a late fusion/stacking fashion. The conducted experiments on eNTERFACE’05 database show significant performance improvements of our proposed system in comparison to state-of-the-art approaches.
Address Cancun; Mexico; December 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPRW
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ NMN2016 Serial 2839
Permanent link to this record
 

 
Author Iiris Lusi; Sergio Escalera; Gholamreza Anbarjafari
Title SASE: RGB-Depth Database for Human Head Pose Estimation Type Conference Article
Year 2016 Publication 14th European Conference on Computer Vision Workshops Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract Slides
Address Amsterdam; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCVW
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ LEA2016a Serial 2840
Permanent link to this record
 

 
Author Pejman Rasti; Tonis Uiboupin; Sergio Escalera; Gholamreza Anbarjafari
Title Convolutional Neural Network Super Resolution for Face Recognition in Surveillance Monitoring Type Conference Article
Year 2016 Publication 9th Conference on Articulated Motion and Deformable Objects Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract
Address Palma de Mallorca; Spain; July 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AMDO
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ RUE2016 Serial 2846
Permanent link to this record
 

 
Author Dennis H. Lundtoft; Kamal Nasrollahi; Thomas B. Moeslund; Sergio Escalera
Title Spatiotemporal Facial Super-Pixels for Pain Detection Type Conference Article
Year 2016 Publication 9th Conference on Articulated Motion and Deformable Objects Abbreviated Journal (down)
Volume Issue Pages
Keywords Facial images; Super-pixels; Spatiotemporal filters; Pain detection
Abstract Best student paper award.
Pain detection using facial images is of critical importance in many Health applications. Since pain is a spatiotemporal process, recent works on this topic employ facial spatiotemporal features to detect pain. These systems extract such features from the entire area of the face. In this paper, we show that by employing super-pixels we can divide the face into three regions, in a way that only one of these regions (about one third of the face) contributes to the pain estimation and the other two regions can be discarded. The experimental results on the UNBCMcMaster database show that the proposed system using this single region outperforms state-of-the-art systems in detecting no-pain scenarios, while it reaches comparable results in detecting weak and severe pain scenarios.
Address Palma de Mallorca; Spain; July 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AMDO
Notes HUPBA;MILAB Approved no
Call Number Admin @ si @ LNM2016 Serial 2847
Permanent link to this record
 

 
Author Mark Philip Philipsen; Anders Jorgensen; Thomas B. Moeslund; Sergio Escalera
Title RGB-D Segmentation of Poultry Entrails Type Conference Article
Year 2016 Publication 9th Conference on Articulated Motion and Deformable Objects Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract Best commercial paper award.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AMDO
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ PJM2016 Serial 2848
Permanent link to this record
 

 
Author Sergio Escalera; Mercedes Torres-Torres; Brais Martinez; Xavier Baro; Hugo Jair Escalante; Isabelle Guyon; Georgios Tzimiropoulos; Ciprian Corneanu; Marc Oliu Simón; Mohammad Ali Bagheri; Michel Valstar
Title ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016 Type Conference Article
Year 2016 Publication 29th IEEE Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. The first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. This paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions. Details of the ChaLearn LAP FotW competitions can be found at http://gesture.chalearn.org.
Address Las Vegas; USA; June 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes HuPBA;MV; Approved no
Call Number ETM2016 Serial 2849
Permanent link to this record
 

 
Author Antonio Esteban Lansaque; Carles Sanchez; Agnes Borras; Marta Diez-Ferrer; Antoni Rosell; Debora Gil
Title Stable Airway Center Tracking for Bronchoscopic Navigation Type Conference Article
Year 2016 Publication 28th Conference of the international Society for Medical Innovation and Technology Abbreviated Journal (down)
Volume Issue Pages
Keywords
Abstract Bronchoscopists use X‐ray fluoroscopy to guide bronchoscopes to the lesion to be biopsied without any kind of incisions. Reducing exposure to X‐ray is important for both patients and doctors but alternatives like electromagnetic navigation require specific equipment and increase the cost of the clinical procedure. We propose a guiding system based on the extraction of airway centers from intra‐operative videos. Such anatomical landmarks could be
matched to the airway centerline extracted from a pre‐planned CT to indicate the best path to the lesion. We present an extraction of lumen centers
from intra‐operative videos based on tracking of maximal stable regions of energy maps.
Address Delft; Rotterdam; Leiden; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SMIT
Notes IAM; Approved no
Call Number Admin @ si @ LSB2016a Serial 2856
Permanent link to this record