|
Eloi Puertas, Miguel Angel Bautista, Daniel Sanchez, Sergio Escalera, & Oriol Pujol. (2014). Learning to Segment Humans by Stacking their Body Parts,. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 685–697). LNCS.
Abstract: Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body
part likelihood maps. These likelihood maps are obtained in a first stage
by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.
Keywords: Human body segmentation; Stacked Sequential Learning
|
|
|
Marc Bolaños, Maite Garolera, & Petia Radeva. (2014). Video Segmentation of Life-Logging Videos. In 8th Conference on Articulated Motion and Deformable Objects (Vol. 8563, pp. 1–9).
|
|
|
Francesco Brughi, Debora Gil, Llorenç Badiella, Eva Jove Casabella, & Oriol Ramos Terrades. (2014). Exploring the impact of inter-query variability on the performance of retrieval systems. In 11th International Conference on Image Analysis and Recognition (Vol. 8814, 413–420). LNCS. Springer International Publishing.
Abstract: This paper introduces a framework for evaluating the performance of information retrieval systems. Current evaluation metrics provide an average score that does not consider performance variability across the query set. In this manner, conclusions lack of any statistical significance, yielding poor inference to cases outside the query set and possibly unfair comparisons. We propose to apply statistical methods in order to obtain a more informative measure for problems in which different query classes can be identified. In this context, we assess the performance variability on two levels: overall variability across the whole query set and specific query class-related variability. To this end, we estimate confidence bands for precision-recall curves, and we apply ANOVA in order to assess the significance of the performance across different query classes.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). Représentation par graphe de mots manuscrits dans les images pour la recherche par similarité. In Colloque International Francophone sur l'Écrit et le Document (pp. 233–248).
Abstract: Effective information retrieval on handwritten document images has always been
a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labeled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment results introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.
Keywords: word spotting; graph-based representation; shape context description; graph edit distance; DTW; block merging; query by example
|
|
|
Michal Drozdzal, Jordi Vitria, Santiago Segui, Carolina Malagelada, Fernando Azpiroz, & Petia Radeva. (2014). Intestinal event segmentation for endoluminal video analysis. In 21st IEEE International Conference on Image Processing (pp. 3592–3596).
|
|
|
Bogdan Raducanu, Alireza Bosaghzadeh, & Fadi Dornaika. (2014). Facial Expression Recognition based on Multi-view Observations with Application to Social Robotics. In 1st Workshop on Computer Vision for Affective Computing (pp. 1–8).
Abstract: Human-robot interaction is a hot topic nowadays in the social robotics community. One crucial aspect is represented by the affective communication which comes encoded through the facial expressions. In this paper, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, view- and texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial
expression.
|
|
|
Maedeh Aghaei, & Petia Radeva. (2014). Bag-of-Tracklets for Person Tracking in Life-Logging Data. In 17th International Conference of the Catalan Association for Artificial Intelligence (Vol. 269, pp. 35–44).
Abstract: By increasing popularity of wearable cameras, life-logging data analysis is becoming more and more important and useful to derive significant events out of this substantial collection of images. In this study, we introduce a new tracking method applied to visual life-logging, called bag-of-tracklets, which is based on detecting, localizing and tracking of people. Given the low spatial and temporal resolution of the image data, our model generates and groups tracklets in a unsupervised framework and extracts image sequences of person appearance according to a similarity score of the bag-of-tracklets. The model output is a meaningful sequence of events expressing human appearance and tracking them in life-logging data. The achieved results prove the robustness of our model in terms of efficiency and accuracy despite the low spatial and temporal resolution of the data.
|
|
|
Joan Arnedo-Moreno, D. Bañeres, Xavier Baro, S. Caballe, S. Guerrero, L. Porta, et al. (2014). Va-ID: A trust-based virtual assessment system. In 6th International Conference on Intelligent Networking and Collaborative Systems (pp. 328–335).
Abstract: Even though online education is a very important pillar of lifelong education, institutions are still reluctant to wager for a fully online educational model. At the end, they keep relying on on-site assessment systems, mainly because fully virtual alternatives do not have the deserved social recognition or credibility. Thus, the design of virtual assessment systems that are able to provide effective proof of student authenticity and authorship and the integrity of the activities in a scalable and cost efficient manner would be very helpful. This paper presents ValID, a virtual assessment approach based on a continuous trust level evaluation between students and the institution. The current trust level serves as the main mechanism to dynamically decide which kind of controls a given student should be subjected to, across different courses in a degree. The main goal is providing a fair trade-off between security, scalability and cost, while maintaining the perceived quality of the educational model.
|
|
|
B. Zhou, Agata Lapedriza, J. Xiao, A. Torralba, & A. Oliva. (2014). Learning Deep Features for Scene Recognition using Places Database. In 28th Annual Conference on Neural Information Processing Systems (pp. 487–495).
|
|
|
Agata Lapedriza, David Masip, & David Sanchez. (2014). Emotions Classification using Facial Action Units Recognition. In 17th International Conference of the Catalan Association for Artificial Intelligence (Vol. 269, pp. 55–64).
Abstract: In this work we build a system for automatic emotion classification from image sequences. We analyze subtle changes in facial expressions by detecting a subset of 12 representative facial action units (AUs). Then, we classify emotions based on the output of these AUs classifiers, i.e. the presence/absence of AUs. We base the AUs classification upon a set of spatio-temporal geometric and appearance features for facial representation, fusing them within the emotion classifier. A decision tree is trained for emotion classifying, making the resulting model easy to interpret by capturing the combination of AUs activation that lead to a particular emotion. For Cohn-Kanade database, the proposed system classifies 7 emotions with a mean accuracy of near 90%, attaining a similar recognition accuracy in comparison with non-interpretable models that are not based in AUs detection.
|
|
|
R. Clariso, David Masip, & A. Rius. (2014). Student projects empowering mobile learning in higher education. RUSC - Revista de Universidad y Sociedad del Conocimiento, 192–207.
|
|
|
L. Rothacker, Marçal Rusiñol, Josep Llados, & G.A. Fink. (2014). A Two-stage Approach to Segmentation-Free Query-by-example Word Spotting. Manuscript Cultures, 47–58.
Abstract: With the ongoing progress in digitization, huge document collections and archives have become available to a broad audience. Scanned document images can be transmitted electronically and studied simultaneously throughout the world. While this is very beneficial, it is often impossible to perform automated searches on these document collections. Optical character recognition usually fails when it comes to handwritten or historic documents. In order to address the need for exploring document collections rapidly, researchers are working on word spotting. In query-by-example word spotting scenarios, the user selects an exemplary occurrence of the query word in a document image. The word spotting system then retrieves all regions in the collection that are visually similar to the given example of the query word. The best matching regions are presented to the user and no actual transcription is required.
An important property of a word spotting system is the computational speed with which queries can be executed. In our previous work, we presented a relatively slow but high-precision method. In the present work, we will extend this baseline system to an integrated two-stage approach. In a coarse-grained first stage, we will filter document images efficiently in order to identify regions that are likely to contain the query word. In the fine-grained second stage, these regions will be analyzed with our previously presented high-precision method. Finally, we will report recognition results and query times for the well-known George Washington
benchmark in our evaluation. We achieve state-of-the-art recognition results while the query times can be reduced to 50% in comparison with our baseline.
|
|
|
Albert Gordo, Florent Perronnin, Yunchao Gong, & Svetlana Lazebnik. (2014). Asymmetric Distances for Binary Embeddings. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 33–47.
Abstract: In large-scale query-by-example retrieval, embedding image signatures in a binary space offers two benefits: data compression and search efficiency. While most embedding algorithms binarize both query and database signatures, it has been noted that this is not strictly a requirement. Indeed, asymmetric schemes which binarize the database signatures but not the query still enjoy the same two benefits but may provide superior accuracy. In this work, we propose two general asymmetric distances which are applicable to a wide variety of embedding techniques including Locality Sensitive Hashing (LSH), Locality Sensitive Binary Codes (LSBC), Spectral Hashing (SH), PCA Embedding (PCAE), PCA Embedding with random rotations (PCAE-RR), and PCA Embedding with iterative quantization (PCAE-ITQ). We experiment on four public benchmarks containing up to 1M images and show that the proposed asymmetric distances consistently lead to large improvements over the symmetric Hamming distance for all binary embedding techniques.
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2014). Domain Adaptation of Deformable Part-Based Models. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12), 2367–2380.
Abstract: The accuracy of object classifiers can significantly drop when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, adapting the classifiers to the scenario in which they must operate is of paramount importance. We present novel domain adaptation (DA) methods for object detection. As proof of concept, we focus on adapting the state-of-the-art deformable part-based model (DPM) for pedestrian detection. We introduce an adaptive structural SVM (A-SSVM) that adapts a pre-learned classifier between different domains. By taking into account the inherent structure in feature space (e.g., the parts in a DPM), we propose a structure-aware A-SSVM (SA-SSVM). Neither A-SSVM nor SA-SSVM needs to revisit the source-domain training data to perform the adaptation. Rather, a low number of target-domain training examples (e.g., pedestrians) are used. To address the scenario where there are no target-domain annotated samples, we propose a self-adaptive DPM based on a self-paced learning (SPL) strategy and a Gaussian Process Regression (GPR). Two types of adaptation tasks are assessed: from both synthetic pedestrians and general persons (PASCAL VOC) to pedestrians imaged from an on-board camera. Results show that our proposals avoid accuracy drops as high as 15 points when comparing adapted and non-adapted detectors.
Keywords: Domain Adaptation; Pedestrian Detection
|
|
|
Santiago Segui, Michal Drozdzal, Ekaterina Zaytseva, Fernando Azpiroz, Petia Radeva, & Jordi Vitria. (2014). Detection of wrinkle frames in endoluminal videos using betweenness centrality measures for images. TITB - IEEE Transactions on Information Technology in Biomedicine, 18(6), 1831–1838.
Abstract: Intestinal contractions are one of the most important events to diagnose motility pathologies of the small intestine. When visualized by wireless capsule endoscopy (WCE), the sequence of frames that represents a contraction is characterized by a clear wrinkle structure in the central frames that corresponds to the folding of the intestinal wall. In this paper we present a new method to robustly detect wrinkle frames in full WCE videos by using a new mid-level image descriptor that is based on a centrality measure proposed for graphs. We present an extended validation, carried out in a very large database, that shows that the proposed method achieves state of the art performance for this task.
Keywords: Wireless Capsule Endoscopy; Small Bowel Motility Dysfunction; Contraction Detection; Structured Prediction; Betweenness Centrality
|
|