|   | 
Details
   web
Records
Author M. Cruz; Cristhian A. Aguilera-Carrasco; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa
Title Cross-spectral image registration and fusion: an evaluation study Type Conference Article
Year 2015 Publication 2nd International Conference on Machine Vision and Machine Learning Abbreviated Journal
Volume Issue Pages
Keywords multispectral imaging; image registration; data fusion; infrared and visible spectra
Abstract (down) This paper presents a preliminary study on the registration and fusion of cross-spectral imaging. The objective is to evaluate the validity of widely used computer vision approaches when they are applied at different
spectral bands. In particular, we are interested in merging images from the infrared (both long wave infrared: LWIR and near infrared: NIR) and visible spectrum (VS). Experimental results with different data sets are presented.
Address Barcelona; July 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MVML
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ CAV2015 Serial 2629
Permanent link to this record
 

 
Author J. Chazalon; Marçal Rusiñol; Jean-Marc Ogier; Josep Llados
Title A Semi-Automatic Groundtruthing Tool for Mobile-Captured Document Segmentation Type Conference Article
Year 2015 Publication 13th International Conference on Document Analysis and Recognition ICDAR2015 Abbreviated Journal
Volume Issue Pages 621-625
Keywords
Abstract (down) This paper presents a novel way to generate groundtruth data for the evaluation of mobile document capture systems, focusing on the first stage of the image processing pipeline involved: document object detection and segmentation in lowquality preview frames. We introduce and describe a simple, robust and fast technique based on color markers which enables a semi-automated annotation of page corners. We also detail a technique for marker removal. Methods and tools presented in the paper were successfully used to annotate, in few hours, 24889
frames in 150 video files for the smartDOC competition at ICDAR 2015
Address Nancy; France; August 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.084; 600.061; 601.223; 600.077 Approved no
Call Number Admin @ si @ CRO2015b Serial 2685
Permanent link to this record
 

 
Author Dennis G.Romero; Anselmo Frizera; Angel Sappa; Boris X. Vintimilla; Teodiano F.Bastos
Title A predictive model for human activity recognition by observing actions and context Type Conference Article
Year 2015 Publication Advanced Concepts for Intelligent Vision Systems, Proceedings of 16th International Conference, ACIVS 2015 Abbreviated Journal
Volume 9386 Issue Pages 323-333
Keywords
Abstract (down) This paper presents a novel model to estimate human activities — a human activity is defined by a set of human actions. The proposed approach is based on the usage of Recurrent Neural Networks (RNN) and Bayesian inference through the continuous monitoring of human actions and its surrounding environment. In the current work human activities are inferred considering not only visual analysis but also additional resources; external sources of information, such as context information, are incorporated to contribute to the activity estimation. The novelty of the proposed approach lies in the way the information is encoded, so that it can be later associated according to a predefined semantic structure. Hence, a pattern representing a given activity can be defined by a set of actions, plus contextual information or other kind of information that could be relevant to describe the activity. Experimental results with real data are provided showing the validity of the proposed approach.
Address Catania; Italy; October 2015
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-319-25902-4 Medium
Area Expedition Conference ACIVS
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ RFS2015 Serial 2661
Permanent link to this record
 

 
Author Meysam Madadi; Sergio Escalera; Jordi Gonzalez; Xavier Roca; Felipe Lumbreras
Title Multi-part body segmentation based on depth maps for soft biometry analysis Type Journal Article
Year 2015 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 56 Issue Pages 14-21
Keywords 3D shape context; 3D point cloud alignment; Depth maps; Human body segmentation; Soft biometry analysis
Abstract (down) This paper presents a novel method extracting biometric measures using depth sensors. Given a multi-part labeled training data, a new subject is aligned to the best model of the dataset, and soft biometrics such as lengths or circumference sizes of limbs and body are computed. The process is performed by training relevant pose clusters, defining a representative model, and fitting a 3D shape context descriptor within an iterative matching procedure. We show robust measures by applying orthogonal plates to body hull. We test our approach in a novel full-body RGB-Depth data set, showing accurate estimation of soft biometrics and better segmentation accuracy in comparison with random forest approach without requiring large training data.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; ISE; ADAS; 600.076;600.049; 600.063; 600.054; 302.018;MILAB Approved no
Call Number Admin @ si @ MEG2015 Serial 2588
Permanent link to this record
 

 
Author Pau Riba; Josep Llados; Alicia Fornes
Title Handwritten Word Spotting by Inexact Matching of Grapheme Graphs Type Conference Article
Year 2015 Publication 13th International Conference on Document Analysis and Recognition ICDAR2015 Abbreviated Journal
Volume Issue Pages 781 - 785
Keywords
Abstract (down) This paper presents a graph-based word spotting for handwritten documents. Contrary to most word spotting techniques, which use statistical representations, we propose a structural representation suitable to be robust to the inherent deformations of handwriting. Attributed graphs are constructed using a part-based approach. Graphemes extracted from shape convexities are used as stable units of handwriting, and are associated to graph nodes. Then, spatial relations between them determine graph edges. Spotting is defined in terms of an error-tolerant graph matching using bipartite-graph matching algorithm. To make the method usable in large datasets, a graph indexing approach that makes use of binary embeddings is used as preprocessing. Historical documents are used as experimental framework. The approach is comparable to statistical ones in terms of time and memory requirements, especially when dealing with large document collections.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.077; 600.061; 602.006 Approved no
Call Number Admin @ si @ RLF2015b Serial 2642
Permanent link to this record
 

 
Author Mohammad Rouhani; Angel Sappa; E. Boyer
Title Implicit B-Spline Surface Reconstruction Type Journal Article
Year 2015 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 24 Issue 1 Pages 22 - 32
Keywords
Abstract (down) This paper presents a fast and flexible curve, and surface reconstruction technique based on implicit B-spline. This representation does not require any parameterization and it is locally supported. This fact has been exploited in this paper to propose a reconstruction technique through solving a sparse system of equations. This method is further accelerated to reduce the dimension to the active control lattice. Moreover, the surface smoothness and user interaction are allowed for controlling the surface. Finally, a novel weighting technique has been introduced in order to blend small patches and smooth them in the overlapping regions. The whole framework is very fast and efficient and can handle large cloud of points with very low computational cost. The experimental results show the flexibility and accuracy of the proposed algorithm to describe objects with complex topologies. Comparisons with other fitting methods highlight the superiority of the proposed approach in the presence of noise and missing data.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1057-7149 ISBN Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ RSB2015 Serial 2541
Permanent link to this record
 

 
Author Victor Ponce; Hugo Jair Escalante; Sergio Escalera; Xavier Baro
Title Gesture and Action Recognition by Evolved Dynamic Subgestures Type Conference Article
Year 2015 Publication 26th British Machine Vision Conference Abbreviated Journal
Volume Issue Pages 129.1-129.13
Keywords
Abstract (down) This paper introduces a framework for gesture and action recognition based on the evolution of temporal gesture primitives, or subgestures. Our work is inspired on the principle of producing genetic variations within a population of gesture subsequences, with the goal of obtaining a set of gesture units that enhance the generalization capability of standard gesture recognition approaches. In our context, gesture primitives are evolved over time using dynamic programming and generative models in order to recognize complex actions. In few generations, the proposed subgesture-based representation
of actions and gestures outperforms the state of the art results on the MSRDaily3D and MSRAction3D datasets.
Address Swansea; uk; September 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference BMVC
Notes HuPBA;MV Approved no
Call Number Admin @ si @ PEE2015 Serial 2657
Permanent link to this record
 

 
Author Suman Ghosh; Lluis Gomez; Dimosthenis Karatzas; Ernest Valveny
Title Efficient indexing for Query By String text retrieval Type Conference Article
Year 2015 Publication 6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015 Abbreviated Journal
Volume Issue Pages 1236 - 1240
Keywords
Abstract (down) This paper deals with Query By String word spotting in scene images. A hierarchical text segmentation algorithm based on text specific selective search is used to find text regions. These regions are indexed per character n-grams present in the text region. An attribute representation based on Pyramidal Histogram of Characters (PHOC) is used to compare text regions with the query text. For generation of the index a similar attribute space based Pyramidal Histogram of character n-grams is used. These attribute models are learned using linear SVMs over the Fisher Vector [1] representation of the images along with the PHOC labels of the corresponding strings.
Address Nancy; France; August 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CBDAR
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ GGK2015 Serial 2693
Permanent link to this record
 

 
Author Antonio Hernandez
Title From pixels to gestures: learning visual representations for human analysis in color and depth data sequences Type Book Whole
Year 2015 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (down) The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others.

In this dissertation we are interested in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RGB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition.

First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on Graph cuts optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts.

At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body. In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually limiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets.

Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences. A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
Address January 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Stan Sclaroff
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-0-2 Medium
Area Expedition Conference
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ Her2015 Serial 2576
Permanent link to this record
 

 
Author Mikhail Mozerov; Joost Van de Weijer
Title Global Color Sparseness and a Local Statistics Prior for Fast Bilateral Filtering Type Journal Article
Year 2015 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 24 Issue 12 Pages 5842-5853
Keywords
Abstract (down) The property of smoothing while preserving edges makes the bilateral filter a very popular image processing tool. However, its non-linear nature results in a computationally costly operation. Various works propose fast approximations to the bilateral filter. However, the majority does not generalize to vector input as is the case with color images. We propose a fast approximation to the bilateral filter for color images. The filter is based on two ideas. First, the number of colors, which occur in a single natural image, is limited. We exploit this color sparseness to rewrite the initial non-linear bilateral filter as a number of linear filter operations. Second, we impose a statistical prior to the image values that are locally present within the filter window. We show that this statistical prior leads to a closed-form solution of the bilateral filter. Finally, we combine both ideas into a single fast and accurate bilateral filter for color images. Experimental results show that our bilateral filter based on the local prior yields an extremely fast bilateral filter approximation, but with limited accuracy, which has potential application in real-time video filtering. Our bilateral filter, which combines color sparseness and local statistics, yields a fast and accurate bilateral filter approximation and obtains the state-of-the-art results.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1057-7149 ISBN Medium
Area Expedition Conference
Notes LAMP; 600.079;ISE Approved no
Call Number Admin @ si @ MoW2015b Serial 2689
Permanent link to this record
 

 
Author Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera; Albert Clapes; Kamal Nasrollahi; Michael Holte; Thomas B. Moeslund
Title Keep it Accurate and Diverse: Enhancing Action Recognition Performance by Ensemble Learning Type Conference Article
Year 2015 Publication IEEE Conference on Computer Vision and Pattern Recognition Worshops (CVPRW) Abbreviated Journal
Volume Issue Pages 22-29
Keywords
Abstract (down) The performance of different action recognition techniques has recently been studied by several computer vision researchers. However, the potential improvement in classification through classifier fusion by ensemble-based methods has remained unattended. In this work, we evaluate the performance of an ensemble of action learning techniques, each performing the recognition task from a different perspective.
The underlying idea is that instead of aiming a very sophisticated and powerful representation/learning technique, we can learn action categories using a set of relatively simple and diverse classifiers, each trained with different feature set. In addition, combining the outputs of several learners can reduce the risk of an unfortunate selection of a learner on an unseen action recognition scenario.
This leads to having a more robust and general-applicable framework. In order to improve the recognition performance, a powerful combination strategy is utilized based on the Dempster-Shafer theory, which can effectively make use
of diversity of base learners trained on different sources of information. The recognition results of the individual classifiers are compared with those obtained from fusing the classifiers’ output, showing enhanced performance of the proposed methodology.
Address Boston; EEUU; June 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ BGE2015 Serial 2655
Permanent link to this record
 

 
Author Alvaro Cepero; Albert Clapes; Sergio Escalera
Title Automatic non-verbal communication skills analysis: a quantitative evaluation Type Journal Article
Year 2015 Publication AI Communications Abbreviated Journal AIC
Volume 28 Issue 1 Pages 87-101
Keywords Social signal processing; human behavior analysis; multi-modal data description; multi-modal data fusion; non-verbal communication analysis; e-Learning
Abstract (down) The oral communication competence is defined on the top of the most relevant skills for one's professional and personal life. Because of the importance of communication in our activities of daily living, it is crucial to study methods to evaluate and provide the necessary feedback that can be used in order to improve these communication capabilities and, therefore, learn how to express ourselves better. In this work, we propose a system capable of evaluating quantitatively the quality of oral presentations in an automatic fashion. The system is based on a multi-modal RGB, depth, and audio data description and a fusion approach in order to recognize behavioral cues and train classifiers able to eventually predict communication quality levels. The performance of the proposed system is tested on a novel dataset containing Bachelor thesis' real defenses, presentations from an 8th semester Bachelor courses, and Master courses' presentations at Universitat de Barcelona. Using as groundtruth the marks assigned by actual instructors, our system achieves high performance categorizing and ranking presentations by their quality, and also making real-valued mark predictions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0921-7126 ISBN Medium
Area Expedition Conference
Notes HUPBA;MILAB Approved no
Call Number Admin @ si @ CCE2015 Serial 2549
Permanent link to this record
 

 
Author Alejandro Gonzalez Alzate; Gabriel Villalonga; German Ros; David Vazquez; Antonio Lopez
Title 3D-Guided Multiscale Sliding Window for Pedestrian Detection Type Conference Article
Year 2015 Publication Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 Abbreviated Journal
Volume 9117 Issue Pages 560-568
Keywords Pedestrian Detection
Abstract (down) The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified as containing a pedestrian or not. Much attention has being paid to the classification module, while candidate generation has mainly relied on (multiscale) sliding window pyramid. However, candidate generation is critical for achieving real-time. In this paper we assume a context of autonomous driving based on stereo vision. Accordingly, we evaluate the effect of taking into account the 3D information (derived from the stereo) in order to prune the hundred of thousands windows per image generated by classical pyramidal sliding window. For our study we use a multimodal (RGB, disparity) and multi-descriptor (HOG, LBP, HOG+LBP) holistic ensemble based on linear SVM. Evaluation on data from the challenging KITTI benchmark suite shows the effectiveness of using 3D information to dramatically reduce the number of candidate windows, even improving the overall pedestrian detection accuracy.
Address Santiago de Compostela; España; June 2015
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area ACDC Expedition Conference IbPRIA
Notes ADAS; 600.076; 600.057; 600.054 Approved no
Call Number ADAS @ adas @ GVR2015 Serial 2585
Permanent link to this record
 

 
Author Wenjuan Gong; Y.Huang; Jordi Gonzalez; Liang Wang
Title An Effective Solution to Double Counting Problem in Human Pose Estimation Type Miscellaneous
Year 2015 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords Pose estimation; double counting problem; mix-ture of parts Model
Abstract (down) The mixture of parts model has been successfully applied to solve the 2D
human pose estimation problem either as an explicitly trained body part model
or as latent variables for pedestrian detection. Even in the era of massive
applications of deep learning techniques, the mixture of parts model is still
effective in solving certain problems, especially in the case with limited
numbers of training samples. In this paper, we consider using the mixture of
parts model for pose estimation, wherein a tree structure is utilized for
representing relations between connected body parts. This strategy facilitates
training and inferencing of the model but suffers from double counting
problems, where one detected body part is counted twice due to lack of
constrains among unconnected body parts. To solve this problem, we propose a
generalized solution in which various part attributes are captured by multiple
features so as to avoid the double counted problem. Qualitative and
quantitative experimental results on a public available dataset demonstrate the
effectiveness of our proposed method.

An Effective Solution to Double Counting Problem in Human Pose Estimation – ResearchGate. Available from: http://www.researchgate.net/publication/271218491AnEffectiveSolutiontoDoubleCountingProbleminHumanPose_Estimation [accessed Oct 22, 2015].
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.078 Approved no
Call Number Admin @ si @ GHG2015 Serial 2590
Permanent link to this record
 

 
Author J.Poujol; Cristhian A. Aguilera-Carrasco; E.Danos; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa
Title Visible-Thermal Fusion based Monocular Visual Odometry Type Conference Article
Year 2015 Publication 2nd Iberian Robotics Conference ROBOT2015 Abbreviated Journal
Volume 417 Issue Pages 517-528
Keywords Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.
Abstract (down) The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained
representations are evaluated under a visual odometry framework, highlighting
their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
Address Lisboa; Portugal; November 2015
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 2194-5357 ISBN 978-3-319-27145-3 Medium
Area Expedition Conference ROBOT
Notes ADAS; 600.076; 600.086 Approved no
Call Number Admin @ si @ PAD2015 Serial 2663
Permanent link to this record