Home | << 1 2 3 4 5 6 7 8 9 10 >> |
Records | |||||
---|---|---|---|---|---|
Author | Jose A. Garcia; David Masip; Valerio Sbragaglia; Jacopo Aguzzi | ||||
Title | Automated Identification and Tracking of Nephrops norvegicus (L.) Using Infrared and Monochromatic Blue Light | Type | Conference Article | ||
Year | 2016 | Publication | 19th International Conference of the Catalan Association for Artificial Intelligence | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | computer vision; video analysis; object recognition; tracking; behaviour; social; decapod; Nephrops norvegicus | ||||
Abstract | Automated video and image analysis can be a very efficient tool to analyze
animal behavior based on sociality, especially in hard access environments for researchers. The understanding of this social behavior can play a key role in the sustainable design of capture policies of many species. This paper proposes the use of computer vision algorithms to identify and track a specific specie, the Norway lobster, Nephrops norvegicus, a burrowing decapod with relevant commercial value which is captured by trawling. These animals can only be captured when are engaged in seabed excursions, which are strongly related with their social behavior. This emergent behavior is modulated by the day-night cycle, but their social interactions remain unknown to the scientific community. The paper introduces an identification scheme made of four distinguishable black and white tags (geometric shapes). The project has recorded 15-day experiments in laboratory pools, under monochromatic blue light (472 nm.) and darkness conditions (recorded using Infra Red light). Using this massive image set, we propose a comparative of state-ofthe-art computer vision algorithms to distinguish and track the different animals’ movements. We evaluate the robustness to the high noise presence in the infrared video signals and free out-of-plane rotations due to animal movement. The experiments show promising accuracies under a cross-validation protocol, being adaptable to the automation and analysis of large scale data. In a second contribution, we created an extensive dataset of shapes (46027 different shapes) from four daily experimental video recordings, which will be available to the community. |
||||
Address | Barcelona; Spain; October 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CCIA | ||
Notes | OR;MV; | Approved | no | ||
Call Number | Admin @ si @ GMS2016 | Serial | 2816 | ||
Permanent link to this record | |||||
Author | Jose A. Garcia; David Masip; Valerio Sbragaglia; Jacopo Aguzzi | ||||
Title | Using ORB, BoW and SVM to identificate and track tagged Norway lobster Nephrops Norvegicus (L.) | Type | Conference Article | ||
Year | 2016 | Publication | 3rd International Conference on Maritime Technology and Engineering | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Sustainable capture policies of many species strongly depend on the understanding of their social behaviour. Nevertheless, the analysis of emergent behaviour in marine species poses several challenges. Usually animals are captured and observed in tanks, and their behaviour is inferred from their dynamics and interactions. Therefore, researchers must deal with thousands of hours of video data. Without loss of generality, this paper proposes a computer
vision approach to identify and track specific species, the Norway lobster, Nephrops norvegicus. We propose an identification scheme were animals are marked using black and white tags with a geometric shape in the center (holed triangle, filled triangle, holed circle and filled circle). Using a massive labelled dataset; we extract local features based on the ORB descriptor. These features are a posteriori clustered, and we construct a Bag of Visual Words feature vector per animal. This approximation yields us invariance to rotation and translation. A SVM classifier achieves generalization results above 99%. In a second contribution, we will make the code and training data publically available. |
||||
Address | Lisboa; Portugal; July 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MARTECH | ||
Notes | OR;MV; | Approved | no | ||
Call Number | Admin @ si @ GMS2016b | Serial | 2817 | ||
Permanent link to this record | |||||
Author | Vassileios Balntas; Edgar Riba; Daniel Ponsa; Krystian Mikolajczyk | ||||
Title | Learning local feature descriptors with triplets and shallow convolutional neural networks | Type | Conference Article | ||
Year | 2016 | Publication | 27th British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | It has recently been demonstrated that local feature descriptors based on convolutional neural networks (CNN) can significantly improve the matching performance. Previous work on learning such descriptors has focused on exploiting pairs of positive and negative patches to learn discriminative CNN representations. In this work, we propose to utilize triplets of training samples, together with in-triplet mining of hard negatives.
We show that our method achieves state of the art results, without the computational overhead typically associated with mining of negatives and with lower complexity of the network architecture. We compare our approach to recently introduced convolutional local feature descriptors, and demonstrate the advantages of the proposed methods in terms of performance and speed. We also examine different loss functions associated with triplets. |
||||
Address | York; UK; September 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | ADAS; 600.086 | Approved | no | ||
Call Number | Admin @ si @ BRP2016 | Serial | 2818 | ||
Permanent link to this record | |||||
Author | Maria Salamo; Inmaculada Rodriguez; Maite Lopez; Anna Puig; Simone Balocco; Mariona Taule | ||||
Title | Recurso docente para la atención de la diversidad en el aula mediante la predicción de notas | Type | Journal | ||
Year | 2016 | Publication | ReVision | Abbreviated Journal | |
Volume | 9 | Issue | 1 | Pages | |
Keywords | Aprendizaje automatico; Sistema de prediccion de notas; Herramienta docente | ||||
Abstract | Desde la implantación del Espacio Europeo de Educación Superior (EEES) en los diferentes grados, se ha puesto de manifiesto la necesidad de utilizar diversos mecanismos que permitan tratar la diversidad en el aula, evaluando automáticamente y proporcionando una retroalimentación rápida tanto al alumnado como al profesorado sobre la evolución de los alumnos en una asignatura. En este artículo se presenta la evaluación de la exactitud en las predicciones de GRADEFORESEER, un recurso docente para la predicción de notas basado en técnicas de aprendizaje automático que permite evaluar la evolución del alumnado y estimar su nota final al terminar el curso. Este recurso se ha complementado con una interfaz de usuario para el profesorado que puede ser usada en diferentes plataformas software (sistemas operativos) y en cualquier asignatura de un grado en la que se utilice evaluación continuada. Además de la descripción del recurso, este artículo presenta los resultados obtenidos al aplicar el sistema de predicción en cuatro asignaturas de disciplinas distintas: Programación I (PI), Diseño de Software (DSW) del grado de Ingeniería Informática, Tecnologías de la Información y la Comunicación (TIC) del grado de Lingüística y la asignatura Fundamentos de Tecnología (FDT) del grado de Información y Documentación, todas ellas impartidas en la Universidad de Barcelona.
La capacidad predictiva se ha evaluado de forma binaria (aprueba o no) y según un criterio de rango (suspenso, aprobado, notable o sobresaliente), obteniendo mejores predicciones en los resultados evaluados de forma binaria. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; | Approved | no | ||
Call Number | Admin @ si @ SRL2016 | Serial | 2820 | ||
Permanent link to this record | |||||
Author | Simone Balocco; Maria Zuluaga; Guillaume Zahnd; Su-Lin Lee; Stefanie Demirci | ||||
Title | Computing and Visualization for Intravascular Imaging and Computer Assisted Stenting | Type | Book Whole | ||
Year | 2016 | Publication | Computing and Visualization for Intravascular Imaging and Computer-Assisted Stenting | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 9780128110188 | Medium | ||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ BZZ2016 | Serial | 2821 | ||
Permanent link to this record | |||||
Author | Jose Marone; Simone Balocco; Marc Bolaños; Jose Massa; Petia Radeva | ||||
Title | Learning the Lumen Border using a Convolutional Neural Networks classifier | Type | Conference Article | ||
Year | 2016 | Publication | 19th International Conference on Medical Image Computing and Computer Assisted Intervention Workshop | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | IntraVascular UltraSound (IVUS) is a technique allowing the diagnosis of coronary plaque. An accurate (semi-)automatic assessment of the luminal contours could speed up the diagnosis. In most of the approaches, the information on the vessel shape is obtained combining a supervised learning step with a local refinement algorithm. In this paper, we explore for the first time, the use of a Convolutional Neural Networks (CNN) architecture that on one hand is able to extract the optimal image features and at the same time can serve as a supervised classifier to detect the lumen border in IVUS images. The main limitation of CNN, relies on the fact that this technique requires a large amount of training data due to the huge amount of parameters that it has. To
solve this issue, we introduce a patch classification approach to generate an extended training-set from a few annotated images. An accuracy of 93% and F-score of 71% was obtained with this technique, even when it was applied to challenging frames containig calcified plaques, stents and catheter shadows. |
||||
Address | Athens; Greece; October 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MICCAIW | ||
Notes | MILAB; | Approved | no | ||
Call Number | Admin @ si @ MBB2016 | Serial | 2822 | ||
Permanent link to this record | |||||
Author | Dena Bazazian; Raul Gomez; Anguelos Nicolaou; Lluis Gomez; Dimosthenis Karatzas; Andrew Bagdanov | ||||
Title | Improving Text Proposals for Scene Images with Fully Convolutional Networks | Type | Conference Article | ||
Year | 2016 | Publication | 23rd International Conference on Pattern Recognition Workshops | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Text Proposals have emerged as a class-dependent version of object proposals – efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text
recognition. In this paper we propose an improvement over the original Text Proposals algorithm of [1], combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art. |
||||
Address | Cancun; Mexico; December 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPRW | ||
Notes | DAG; LAMP; 600.084 | Approved | no | ||
Call Number | Admin @ si @ BGN2016 | Serial | 2823 | ||
Permanent link to this record | |||||
Author | Cesar de Souza; Adrien Gaidon; Eleonora Vig; Antonio Lopez | ||||
Title | Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition | Type | Conference Article | ||
Year | 2016 | Publication | 14th European Conference on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 697-716 | ||
Keywords | |||||
Abstract | Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos. | ||||
Address | Amsterdam; The Netherlands; October 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | ADAS; 600.076; 600.085 | Approved | no | ||
Call Number | Admin @ si @ SGV2016 | Serial | 2824 | ||
Permanent link to this record | |||||
Author | Maria Elena Meza-de-Luna; Juan Ramon Terven Salinas; Bogdan Raducanu; Joaquin Salas | ||||
Title | Assessing the Influence of Mirroring on the Perception of Professional Competence using Wearable Technology | Type | Journal Article | ||
Year | 2016 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 9 | Issue | 2 | Pages | 161-175 |
Keywords | Mirroring; Nodding; Competence; Perception; Wearable Technology | ||||
Abstract | Nonverbal communication is an intrinsic part in daily face-to-face meetings. A frequently observed behavior during social interactions is mirroring, in which one person tends to mimic the attitude of the counterpart. This paper shows that a computer vision system could be used to predict the perception of competence in dyadic interactions through the automatic detection of mirroring
events. To prove our hypothesis, we developed: (1) A social assistant for mirroring detection, using a wearable device which includes a video camera and (2) an automatic classifier for the perception of competence, using the number of nodding gestures and mirroring events as predictors. For our study, we used a mixed-method approach in an experimental design where 48 participants acting as customers interacted with a confederated psychologist. We found that the number of nods or mirroring events has a significant influence on the perception of competence. Our results suggest that: (1) Customer mirroring is a better predictor than psychologist mirroring; (2) the number of psychologist’s nods is a better predictor than the number of customer’s nods; (3) except for the psychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images from wearable smartglasses or fixed cameras. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.072; | Approved | no | ||
Call Number | Admin @ si @ MTR2016 | Serial | 2826 | ||
Permanent link to this record | |||||
Author | Hugo Jair Escalante; Victor Ponce; Jun Wan; Michael A. Riegler; Baiyu Chen; Albert Clapes; Sergio Escalera; Isabelle Guyon; Xavier Baro; Pal Halvorsen; Henning Muller; Martha Larson | ||||
Title | ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An Overview | Type | Conference Article | ||
Year | 2016 | Publication | 23rd International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This paper provides an overview of the Joint Contest on Multimedia Challenges Beyond Visual Analysis. We organized an academic competition that focused on four problems that require effective processing of multimodal information in order to be solved. Two tracks were devoted to gesture spotting and recognition from RGB-D video, two fundamental problems for human computer interaction. Another track was devoted to a second round of the first impressions challenge of which the goal was to develop methods to recognize personality traits from
short video clips. For this second round we adopted a novel collaborative-competitive (i.e., coopetition) setting. The fourth track was dedicated to the problem of video recommendation for improving user experience. The challenge was open for about 45 days, and received outstanding participation: almost 200 participants registered to the contest, and 20 teams sent predictions in the final stage. The main goals of the challenge were fulfilled: the state of the art was advanced considerably in the four tracks, with novel solutions to the proposed problems (mostly relying on deep learning). However, further research is still required. The data of the four tracks will be available to allow researchers to keep making progress in the four tracks. |
||||
Address | Cancun; Mexico; December 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | HuPBA; 602.143;MV | Approved | no | ||
Call Number | Admin @ si @ EPW2016 | Serial | 2827 | ||
Permanent link to this record | |||||
Author | Baiyu Chen; Sergio Escalera; Isabelle Guyon; Victor Ponce; N. Shah; Marc Oliu | ||||
Title | Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits | Type | Conference Article | ||
Year | 2016 | Publication | 14th European Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Calibration of labels; Label bias; Ordinal labeling; Variance Models; Bradley-Terry-Luce model; Continuous labels; Regression; Personality traits; Crowd-sourced labels | ||||
Abstract | We address the problem of calibration of workers whose task is to label patterns with continuous variables, which arises for instance in labeling images of videos of humans with continuous traits. Worker bias is particularly dicult to evaluate and correct when many workers contribute just a few labels, a situation arising typically when labeling is crowd-sourced. In the scenario of labeling short videos of people facing a camera with personality traits, we evaluate the feasibility of the pairwise ranking method to alleviate bias problems. Workers are exposed to pairs of videos at a time and must order by preference. The variable levels are reconstructed by fitting a Bradley-Terry-Luce model with maximum likelihood. This method may at first sight, seem prohibitively expensive because for N videos, p = N (N-1)/2 pairs must be potentially processed by workers rather that N videos. However, by performing extensive simulations, we determine an empirical law for the scaling of the number of pairs needed as a function of the number of videos in order to achieve a given accuracy of score reconstruction and show that the pairwise method is a ordable. We apply the method to the labeling of a large scale dataset of 10,000 videos used in the ChaLearn Apparent Personality Trait challenge. | ||||
Address | Amsterdam; The Netherlands; October 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCVW | ||
Notes | HuPBA;MILAB; | Approved | no | ||
Call Number | Admin @ si @ CEG2016 | Serial | 2829 | ||
Permanent link to this record | |||||
Author | Sumit K. Banchhor; Tadashi Araki; Narendra D. Londhe; Nobutaka Ikeda; Petia Radeva; Ayman El-Baz; Luca Saba; Andrew Nicolaides; Shoaib Shafique; John R. Laird; Jasjit S. Suri | ||||
Title | Five multiresolution-based calcium volume measurement techniques from coronary IVUS videos: A comparative approach | Type | Journal Article | ||
Year | 2016 | Publication | Computer Methods and Programs in Biomedicine | Abbreviated Journal | CMPB |
Volume | 134 | Issue | Pages | 237-258 | |
Keywords | |||||
Abstract | BACKGROUND AND OBJECTIVE:
Fast intravascular ultrasound (IVUS) video processing is required for calcium volume computation during the planning phase of percutaneous coronary interventional (PCI) procedures. Nonlinear multiresolution techniques are generally applied to improve the processing time by down-sampling the video frames. METHODS: This paper presents four different segmentation methods for calcium volume measurement, namely Threshold-based, Fuzzy c-Means (FCM), K-means, and Hidden Markov Random Field (HMRF) embedded with five different kinds of multiresolution techniques (bilinear, bicubic, wavelet, Lanczos, and Gaussian pyramid). This leads to 20 different kinds of combinations. IVUS image data sets consisting of 38,760 IVUS frames taken from 19 patients were collected using 40 MHz IVUS catheter (Atlantis® SR Pro, Boston Scientific®, pullback speed of 0.5 mm/sec.). The performance of these 20 systems is compared with and without multiresolution using the following metrics: (a) computational time; (b) calcium volume; (c) image quality degradation ratio; and (d) quality assessment ratio. RESULTS: Among the four segmentation methods embedded with five kinds of multiresolution techniques, FCM segmentation combined with wavelet-based multiresolution gave the best performance. FCM and wavelet experienced the highest percentage mean improvement in computational time of 77.15% and 74.07%, respectively. Wavelet interpolation experiences the highest mean precision-of-merit (PoM) of 94.06 ± 3.64% and 81.34 ± 16.29% as compared to other multiresolution techniques for volume level and frame level respectively. Wavelet multiresolution technique also experiences the highest Jaccard Index and Dice Similarity of 0.7 and 0.8, respectively. Multiresolution is a nonlinear operation which introduces bias and thus degrades the image. The proposed system also provides a bias correction approach to enrich the system, giving a better mean calcium volume similarity for all the multiresolution-based segmentation methods. After including the bias correction, bicubic interpolation gives the largest increase in mean calcium volume similarity of 4.13% compared to the rest of the multiresolution techniques. The system is automated and can be adapted in clinical settings. CONCLUSIONS: We demonstrated the time improvement in calcium volume computation without compromising the quality of IVUS image. Among the 20 different combinations of multiresolution with calcium volume segmentation methods, the FCM embedded with wavelet-based multiresolution gave the best performance. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; | Approved | no | ||
Call Number | Admin @ si @ BAL2016 | Serial | 2830 | ||
Permanent link to this record | |||||
Author | Petia Radeva | ||||
Title | Can Deep Learning and Egocentric Vision for Visual Lifelogging Help Us Eat Better? | Type | Conference Article | ||
Year | 2016 | Publication | 19th International Conference of the Catalan Association for Artificial Intelligence | Abbreviated Journal | |
Volume | 4 | Issue | Pages | ||
Keywords | |||||
Abstract | |||||
Address | Barcelona; October 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CCIA | ||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ Rad2016 | Serial | 2832 | ||
Permanent link to this record | |||||
Author | Alvaro Peris; Marc Bolaños; Petia Radeva; Francisco Casacuberta | ||||
Title | Video Description Using Bidirectional Recurrent Neural Networks | Type | Conference Article | ||
Year | 2016 | Publication | 25th International Conference on Artificial Neural Networks | Abbreviated Journal | |
Volume | 2 | Issue | Pages | 3-11 | |
Keywords | Video description; Neural Machine Translation; Birectional Recurrent Neural Networks; LSTM; Convolutional Neural Networks | ||||
Abstract | Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames. | ||||
Address | Barcelona; September 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICANN | ||
Notes | MILAB; | Approved | no | ||
Call Number | Admin @ si @ PBR2016 | Serial | 2833 | ||
Permanent link to this record | |||||
Author | Marc Bolaños; Petia Radeva | ||||
Title | Simultaneous Food Localization and Recognition | Type | Conference Article | ||
Year | 2016 | Publication | 23rd International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | CoRR abs/1604.07953
The development of automatic nutrition diaries, which would allow to keep track objectively of everything we eat, could enable a whole new world of possibilities for people concerned about their nutrition patterns. With this purpose, in this paper we propose the first method for simultaneous food localization and recognition. Our method is based on two main steps, which consist in, first, produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, second, recognize each of the food types or food-related objects present in each bounding box. We demonstrate that our proposal, compared to the most similar problem nowadays – object localization, is able to obtain high precision and reasonable recall levels with only a few bounding boxes. Furthermore, we show that it is applicable to both conventional and egocentric images. |
||||
Address | Cancun; Mexico; December 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | MILAB; no proj | Approved | no | ||
Call Number | Admin @ si @ BoR2016 | Serial | 2834 | ||
Permanent link to this record |