|   | 
Details
   web
Records
Author Carlos David Martinez Hinarejos; Josep Llados; Alicia Fornes; Francisco Casacuberta; Lluis de Las Heras; Joan Mas; Moises Pastor; Oriol Ramos Terrades; Joan Andreu Sanchez; Enrique Vidal; Fernando Vilariño
Title Context, multimodality, and user collaboration in handwritten text processing: the CoMUN-HaT project Type Conference Article
Year 2016 Publication 3rd IberSPEECH Abbreviated Journal
Volume Issue Pages
Keywords (down)
Abstract Processing of handwritten documents is a task that is of wide interest for many
purposes, such as those related to preserve cultural heritage. Handwritten text recognition techniques have been successfully applied during the last decade to obtain transcriptions of handwritten documents, and keyword spotting techniques have been applied for searching specific terms in image collections of handwritten documents. However, results on transcription and indexing are far from perfect. In this framework, the use of new data sources arises as a new paradigm that will allow for a better transcription and indexing of handwritten documents. Three main different data sources could be considered: context of the document (style, writer, historical time, topics,. . . ), multimodal data (representations of the document in a different modality, such as the speech signal of the dictation of the text), and user feedback (corrections, amendments,. . . ). The CoMUN-HaT project aims at the integration of these different data sources into the transcription and indexing task for handwritten documents: the use of context derived from the analysis of the documents, how multimodality can aid the recognition process to obtain more accurate transcriptions (including transcription in a modern version of the language), and integration into a userin-the-loop assisted text transcription framework. This will be reflected in the construction of a transcription and indexing platform that can be used by both professional and nonprofessional users, contributing to crowd-sourcing activities to preserve cultural heritage and to obtain an accessible version of the involved corpus.
Address Lisboa; Portugal; November 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IberSPEECH
Notes DAG; MV; 600.097;SIAI Approved no
Call Number Admin @ si @MLF2016 Serial 2813
Permanent link to this record
 

 
Author Maedeh Aghaei; Mariella Dimiccoli; Petia Radeva
Title Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams Type Journal Article
Year 2016 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU
Volume 149 Issue Pages 146-156
Keywords (down)
Abstract Wearable cameras offer a hands-free way to record egocentric images of daily experiences, where social events are of special interest. The first step towards detection of social events is to track the appearance of multiple persons involved in them. In this paper, we propose a novel method to find correspondences of multiple faces in low temporal resolution egocentric videos acquired through a wearable camera. This kind of photo-stream imposes additional challenges to the multi-tracking problem with respect to conventional videos. Due to the free motion of the camera and to its low temporal resolution, abrupt changes in the field of view, in illumination condition and in the target location are highly frequent. To overcome such difficulties, we propose a multi-face tracking method that generates a set of tracklets through finding correspondences along the whole sequence for each detected face and takes advantage of the tracklets redundancy to deal with unreliable ones. Similar tracklets are grouped into the so called extended bag-of-tracklets (eBoT), which is aimed to correspond to a specific person. Finally, a prototype tracklet is extracted for each eBoT, where the occurred occlusions are estimated by relying on a new measure of confidence. We validated our approach over an extensive dataset of egocentric photo-streams and compared it to state of the art methods, demonstrating its effectiveness and robustness.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; Approved no
Call Number Admin @ si @ ADR2016b Serial 2742
Permanent link to this record
 

 
Author Onur Ferhat; Fernando Vilariño
Title Low Cost Eye Tracking: The Current Panorama Type Journal Article
Year 2016 Publication Computational Intelligence and Neuroscience Abbreviated Journal CIN
Volume Issue Pages Article ID 8680541
Keywords (down)
Abstract Despite the availability of accurate, commercial gaze tracker devices working with infrared (IR) technology, visible light gaze tracking constitutes an interesting alternative by allowing scalability and removing hardware requirements. Over the last years, this field has seen examples of research showing performance comparable to the IR alternatives. In this work, we survey the previous work on remote, visible light gaze trackers and analyze the explored techniques from various perspectives such as calibration strategies, head pose invariance, and gaze estimation techniques. We also provide information on related aspects of research such as public datasets to test against, open source projects to build upon, and gaze tracking services to directly use in applications. With all this information, we aim to provide the contemporary and future researchers with a map detailing previously explored ideas and the required tools.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MV; 605.103; 600.047; 600.097;SIAI Approved no
Call Number Admin @ si @ FeV2016 Serial 2744
Permanent link to this record
 

 
Author C. Alejandro Parraga; Arash Akbarinia
Title NICE: A Computational Solution to Close the Gap from Colour Perception to Colour Categorization Type Journal Article
Year 2016 Publication PLoS One Abbreviated Journal Plos
Volume 11 Issue 3 Pages e0149538
Keywords (down)
Abstract The segmentation of visible electromagnetic radiation into chromatic categories by the human visual system has been extensively studied from a perceptual point of view, resulting in several colour appearance models. However, there is currently a void when it comes to relate these results to the physiological mechanisms that are known to shape the pre-cortical and cortical visual pathway. This work intends to begin to fill this void by proposing a new physiologically plausible model of colour categorization based on Neural Isoresponsive Colour Ellipsoids (NICE) in the cone-contrast space defined by the main directions of the visual signals entering the visual cortex. The model was adjusted to fit psychophysical measures that concentrate on the categorical boundaries and are consistent with the ellipsoidal isoresponse surfaces of visual cortical neurons. By revealing the shape of such categorical colour regions, our measures allow for a more precise and parsimonious description, connecting well-known early visual processing mechanisms to the less understood phenomenon of colour categorization. To test the feasibility of our method we applied it to exemplary images and a popular ground-truth chart obtaining labelling results that are better than those of current state-of-the-art algorithms.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT; 600.068 Approved no
Call Number Admin @ si @ PaA2016a Serial 2747
Permanent link to this record
 

 
Author Joan Mas; Alicia Fornes; Josep Llados
Title An Interactive Transcription System of Census Records using Word-Spotting based Information Transfer Type Conference Article
Year 2016 Publication 12th IAPR Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 54-59
Keywords (down)
Abstract This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.
Address Santorini; Greece; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 603.053; 602.006; 600.061; 600.077; 600.097 Approved no
Call Number Admin @ si @ MFL2016 Serial 2751
Permanent link to this record
 

 
Author Juan Ignacio Toledo; Alicia Fornes; Jordi Cucurull; Josep Llados
Title Election Tally Sheets Processing System Type Conference Article
Year 2016 Publication 12th IAPR Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 364-368
Keywords (down)
Abstract In paper based elections, manual tallies at polling station level produce myriads of documents. These documents share a common form-like structure and a reduced vocabulary worldwide. On the other hand, each tally sheet is filled by a different writer and on different countries, different scripts are used. We present a complete document analysis system for electoral tally sheet processing combining state of the art techniques with a new handwriting recognition subprocess based on unsupervised feature discovery with Variational Autoencoders and sequence classification with BLSTM neural networks. The whole system is designed to be script independent and allows a fast and reliable results consolidation process with reduced operational cost.
Address Santorini; Greece; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 602.006; 600.061; 601.225; 600.077; 600.097 Approved no
Call Number TFC2016 Serial 2752
Permanent link to this record
 

 
Author Anders Hast; Alicia Fornes
Title A Segmentation-free Handwritten Word Spotting Approach by Relaxed Feature Matching Type Conference Article
Year 2016 Publication 12th IAPR Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 150-155
Keywords (down)
Abstract The automatic recognition of historical handwritten documents is still considered challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.
Address Santorini; Greece; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 602.006; 600.061; 600.077; 600.097 Approved no
Call Number HaF2016 Serial 2753
Permanent link to this record
 

 
Author Dimosthenis Karatzas; V. Poulain d'Andecy; Marçal Rusiñol
Title Human-Document Interaction – a new frontier for document image analysis Type Conference Article
Year 2016 Publication 12th IAPR Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 369-374
Keywords (down)
Abstract All indications show that paper documents will not cede in favour of their digital counterparts, but will instead be used increasingly in conjunction with digital information. An open challenge is how to seamlessly link the physical with the digital – how to continue taking advantage of the important affordances of paper, without missing out on digital functionality. This paper
presents the authors’ experience with developing systems for Human-Document Interaction based on augmented document interfaces and examines new challenges and opportunities arising for the document image analysis field in this area. The system presented combines state of the art camera-based document
image analysis techniques with a range of complementary tech-nologies to offer fluid Human-Document Interaction. Both fixed and nomadic setups are discussed that have gone through user testing in real-life environments, and use cases are presented that span the spectrum from business to educational application
Address Santorini; Greece; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.084; 600.077 Approved no
Call Number KPR2016 Serial 2756
Permanent link to this record
 

 
Author Marc Masana; Joost Van de Weijer; Andrew Bagdanov
Title On-the-fly Network pruning for object detection Type Conference Article
Year 2016 Publication International conference on learning representations Abbreviated Journal
Volume Issue Pages
Keywords (down)
Abstract Object detection with deep neural networks is often performed by passing a few
thousand candidate bounding boxes through a deep neural network for each image.
These bounding boxes are highly correlated since they originate from the same
image. In this paper we investigate how to exploit feature occurrence at the image scale to prune the neural network which is subsequently applied to all bounding boxes. We show that removing units which have near-zero activation in the image allows us to significantly reduce the number of parameters in the network. Results on the PASCAL 2007 Object Detection Challenge demonstrate that up to 40% of units in some fully-connected layers can be entirely eliminated with little change in the detection result.
Address Puerto Rico; May 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICLR
Notes LAMP; 600.068; 600.106; 600.079 Approved no
Call Number Admin @ si @MWB2016 Serial 2758
Permanent link to this record
 

 
Author Marc Oliu; Ciprian Corneanu; Laszlo A. Jeni; Jeffrey F. Cohn; Takeo Kanade; Sergio Escalera
Title Continuous Supervised Descent Method for Facial Landmark Localisation Type Conference Article
Year 2016 Publication 13th Asian Conference on Computer Vision Abbreviated Journal
Volume 10112 Issue Pages 121-135
Keywords (down)
Abstract Recent methods for facial landmark location perform well on close-to-frontal faces but have problems in generalising to large head rotations. In order to address this issue we propose a second order linear regression method that is both compact and robust against strong rotations. We provide a closed form solution, making the method fast to train. We test the method’s performance on two challenging datasets. The first has been intensely used by the community. The second has been specially generated from a well known 3D face dataset. It is considerably more challenging, including a high diversity of rotations and more samples than any other existing public dataset. The proposed method is compared against state-of-the-art approaches, including RCPR, CGPRT, LBF, CFSS, and GSDM. Results upon both datasets show that the proposed method offers state-of-the-art performance on near frontal view data, improves state-of-the-art methods on more challenging head rotation problems and keeps a compact model size.
Address Taipei; Taiwan; November 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ACCV
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ OCJ2016 Serial 2838
Permanent link to this record
 

 
Author Ozan Caglayan; Walid Aransa; Yaxing Wang; Marc Masana; Mercedes Garcıa-Martinez; Fethi Bougares; Loic Barrault; Joost Van de Weijer
Title Does Multimodality Help Human and Machine for Translation and Image Captioning? Type Conference Article
Year 2016 Publication 1st conference on machine translation Abbreviated Journal
Volume Issue Pages
Keywords (down)
Abstract This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
Address Berlin; Germany; August 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WMT
Notes LAMP; 600.106 ; 600.068 Approved no
Call Number Admin @ si @ CAW2016 Serial 2761
Permanent link to this record
 

 
Author Muhammad Anwer Rao; Fahad Shahbaz Khan; Joost Van de Weijer; Jorma Laaksonen
Title Combining Holistic and Part-based Deep Representations for Computational Painting Categorization Type Conference Article
Year 2016 Publication 6th International Conference on Multimedia Retrieval Abbreviated Journal
Volume Issue Pages
Keywords (down)
Abstract Automatic analysis of visual art, such as paintings, is a challenging inter-disciplinary research problem. Conventional approaches only rely on global scene characteristics by encoding holistic information for computational painting categorization.We argue that such approaches are sub-optimal and that discriminative common visual structures provide complementary information for painting classification. We present an approach that encodes both the global scene layout and discriminative latent common structures for computational painting categorization. The region of interests are automatically extracted, without any manual part labeling, by training class-specific deformable part-based models. Both holistic and region-of-interests are then described using multi-scale dense convolutional features. These features are pooled separately using Fisher vector encoding and concatenated afterwards in a single image representation. Experiments are performed on a challenging dataset with 91 different painters and 13 diverse painting styles. Our approach outperforms the standard method, which only employs the global scene characteristics. Furthermore, our method achieves state-of-the-art results outperforming a recent multi-scale deep features based approach [11] by 6.4% and 3.8% respectively on artist and style classification.
Address New York; USA; June 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICMR
Notes LAMP; 600.068; 600.079;ADAS Approved no
Call Number Admin @ si @ RKW2016 Serial 2763
Permanent link to this record
 

 
Author Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera
Title Action Recognition by Pairwise Proximity Function Support Vector Machines with Dynamic Time Warping Kernels Type Conference Article
Year 2016 Publication 29th Canadian Conference on Artificial Intelligence Abbreviated Journal
Volume 9673 Issue Pages 3-14
Keywords (down)
Abstract In the context of human action recognition using skeleton data, the 3D trajectories of joint points may be considered as multi-dimensional time series. The traditional recognition technique in the literature is based on time series dis(similarity) measures (such as Dynamic Time Warping). For these general dis(similarity) measures, k-nearest neighbor algorithms are a natural choice. However, k-NN classifiers are known to be sensitive to noise and outliers. In this paper, a new class of Support Vector Machine that is applicable to trajectory classification, such as action recognition, is developed by incorporating an efficient time-series distances measure into the kernel function. More specifically, the derivative of Dynamic Time Warping (DTW) distance measure is employed as the SVM kernel. In addition, the pairwise proximity learning strategy is utilized in order to make use of non-positive semi-definite (PSD) kernels in the SVM formulation. The recognition results of the proposed technique on two action recognition datasets demonstrates the ourperformance of our methodology compared to the state-of-the-art methods. Remarkably, we obtained 89 % accuracy on the well-known MSRAction3D dataset using only 3D trajectories of body joints obtained by Kinect
Address Victoria; Canada; May 2016
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AI
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ BGE2016b Serial 2770
Permanent link to this record
 

 
Author Jun Wan; Yibing Zhao; Shuai Zhou; Isabelle Guyon; Sergio Escalera
Title ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition Type Conference Article
Year 2016 Publication 29th IEEE Conference on Computer Vision and Pattern Recognition Worshops Abbreviated Journal
Volume Issue Pages
Keywords (down)
Abstract In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD)and the Continuous Gesture Dataset (ConGD). Both datasets are derived from the ChaLearn Gesture Dataset
(CGD) that has a total of more than 50000 gestures for the “one-shot-learning” competition. To increase the potential of the old dataset, we designed new well curated datasets composed of 249 gesture labels, and including 47933 gestures manually labeled the begin and end frames in sequences.Using these datasets we will open two competitions
on the CodaLab platform so that researchers can test and compare their methods for “user independent” gesture recognition. The first challenge is designed for gesture spotting
and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. The baseline method based on the bag of visual words model is also presented.
Address Las Vegas; USA; July 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ WZZ2016 Serial 2771
Permanent link to this record
 

 
Author Florin Popescu; Stephane Ayache; Sergio Escalera; Xavier Baro; Cecile Capponi; Patrick Panciatici; Isabelle Guyon
Title From geospatial observations of ocean currents to causal predictors of spatio-economic activity using computer vision and machine learning Type Conference Article
Year 2016 Publication European Geosciences Union General Assembly Abbreviated Journal
Volume 18 Issue Pages
Keywords (down)
Abstract The big data transformation currently revolutionizing science and industry forges novel possibilities in multimodal analysis scarcely imaginable only a decade ago. One of the important economic and industrial problems that stand to benefit from the recent expansion of data availability and computational prowess is the prediction of electricity demand and renewable energy generation. Both are correlates of human activity: spatiotemporal energy consumption patterns in society are a factor of both demand (weather dependent) and supply, which determine cost – a relation expected to strengthen along with increasing renewable energy dependence. One of the main drivers of European weather patterns is the activity of the Atlantic Ocean and in particular its dominant Northern Hemisphere current: the Gulf Stream. We choose this particular current as a test case in part due to larger amount of relevant data and scientific literature available for refinement of analysis techniques.
This data richness is due not only to its economic importance but also to its size being clearly visible in radar and infrared satellite imagery, which makes it easier to detect using Computer Vision (CV). The power of CV techniques makes basic analysis thus developed scalable to other smaller and less known, but still influential, currents, which are not just curves on a map, but complex, evolving, moving branching trees in 3D projected onto a 2D image.
We investigate means of extracting, from several image modalities (including recently available Copernicus radar and earlier Infrared satellites), a parameterized presentation of the state of the Gulf Stream and its environment that is useful as feature space representation in a machine learning context, in this case with the EC’s H2020-sponsored ‘See.4C’ project, in the context of which data scientists may find novel predictors of spatiotemporal energy flow. Although automated extractors of Gulf Stream position exist, they differ in methodology and result. We shall attempt to extract more complex feature representation including branching points, eddies and parameterized changes in transport and velocity. Other related predictive features will be similarly developed, such as inference of deep water flux long the current path and wider spatial scale features such as Hough transform, surface turbulence indicators and temperature gradient indexes along with multi-time scale analysis of ocean height and temperature dynamics. The geospatial imaging and ML community may therefore benefit from a baseline of open-source techniques useful and expandable to other related prediction and/or scientific analysis tasks.
Address Vienna; Austria; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference EGU
Notes HuPBA;MV; Approved no
Call Number Admin @ si @ PAE2016 Serial 2772
Permanent link to this record