|
Michal Drozdzal, Jordi Vitria, Santiago Segui, Carolina Malagelada, Fernando Azpiroz, & Petia Radeva. (2014). Intestinal event segmentation for endoluminal video analysis. In 21st IEEE International Conference on Image Processing (pp. 3592–3596).
|
|
|
Clement Guerin, Christophe Rigaud, Karell Bertet, Jean-Christophe Burie, Arnaud Revel, & Jean-Marc Ogier. (2014). Réduction de l’espace de recherche pour les personnages de bandes dessinées. In 19th National Congress Reconnaissance de Formes et l'Intelligence Artificielle.
Abstract: Les bandes dessinées représentent un patrimoine culturel important dans de nombreux pays et leur numérisation massive offre la possibilité d'effectuer des recherches dans le contenu des images. À ce jour, ce sont principalement les structures des pages et leurs contenus textuels qui ont été étudiés, peu de travaux portent sur le contenu graphique. Nous proposons de nous appuyer sur des éléments déjà étudiés tels que la position des cases et des bulles, pour réduire l'espace de recherche et localiser les personnages en fonction de la queue des bulles. L'évaluation de nos différentes contributions à partir de la base eBDtheque montre un taux de détection des queues de bulle de 81.2%, de localisation des personnages allant jusqu'à 85% et un gain d'espace de recherche de plus de 50%.
Keywords: contextual search; document analysis; comics characters
|
|
|
Joan Arnedo-Moreno, D. Bañeres, Xavier Baro, S. Caballe, S. Guerrero, L. Porta, et al. (2014). Va-ID: A trust-based virtual assessment system. In 6th International Conference on Intelligent Networking and Collaborative Systems (pp. 328–335).
Abstract: Even though online education is a very important pillar of lifelong education, institutions are still reluctant to wager for a fully online educational model. At the end, they keep relying on on-site assessment systems, mainly because fully virtual alternatives do not have the deserved social recognition or credibility. Thus, the design of virtual assessment systems that are able to provide effective proof of student authenticity and authorship and the integrity of the activities in a scalable and cost efficient manner would be very helpful. This paper presents ValID, a virtual assessment approach based on a continuous trust level evaluation between students and the institution. The current trust level serves as the main mechanism to dynamically decide which kind of controls a given student should be subjected to, across different courses in a degree. The main goal is providing a fair trade-off between security, scalability and cost, while maintaining the perceived quality of the educational model.
|
|
|
E. Bondi, L. Sidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Real-time people counting from depth imagery of crowded environments. In 11th IEEE International Conference on Advanced Video and Signal based Surveillance (pp. 337–342).
Abstract: In this paper we describe a system for automatic people counting in crowded environments. The approach we propose is a counting-by-detection method based on depth imagery. It is designed to be deployed as an autonomous appliance for crowd analysis in video surveillance application scenarios. Our system performs foreground/background segmentation on depth image streams in order to coarsely segment persons, then depth information is used to localize head candidates which are then tracked in time on an automatically estimated ground plane. The system runs in real-time, at a frame-rate of about 20 fps. We collected a dataset of RGB-D sequences representing three typical and challenging surveillance scenarios, including crowds, queuing and groups. An extensive comparative evaluation is given between our system and more complex, Latent SVM-based head localization for person counting applications.
|
|
|
Bogdan Raducanu, Alireza Bosaghzadeh, & Fadi Dornaika. (2014). Facial Expression Recognition based on Multi-view Observations with Application to Social Robotics. In 1st Workshop on Computer Vision for Affective Computing (pp. 1–8).
Abstract: Human-robot interaction is a hot topic nowadays in the social robotics community. One crucial aspect is represented by the affective communication which comes encoded through the facial expressions. In this paper, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, view- and texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial
expression.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2014). Generic Subclass Ensemble: A Novel Approach to Ensemble Classification. In 22nd International Conference on Pattern Recognition (pp. 1254–1259).
Abstract: Multiple classifier systems, also known as classifier ensembles, have received great attention in recent years because of their improved classification accuracy in different applications. In this paper, we propose a new general approach to ensemble classification, named generic subclass ensemble, in which each base classifier is trained with data belonging to a subset of classes, and thus discriminates among a subset of target categories. The ensemble classifiers are then fused using a combination rule. The proposed approach differs from existing methods that manipulate the target attribute, since in our approach individual classification problems are not restricted to two-class problems. We perform a series of experiments to evaluate the efficiency of the generic subclass approach on a set of benchmark datasets. Experimental results with multilayer perceptrons show that the proposed approach presents a viable alternative to the most commonly used ensemble classification approaches.
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, Andrew Bagdanov, & Michael Felsberg. (2014). Scale Coding Bag-of-Words for Action Recognition. In 22nd International Conference on Pattern Recognition (pp. 1514–1519).
Abstract: Recognizing human actions in still images is a challenging problem in computer vision due to significant amount of scale, illumination and pose variation. Given the bounding box of a person both at training and test time, the task is to classify the action associated with each bounding box in an image.
Most state-of-the-art methods use the bag-of-words paradigm for action recognition. The bag-of-words framework employing a dense multi-scale grid sampling strategy is the de facto standard for feature detection. This results in a scale invariant image representation where all the features at multiple-scales are binned in a single histogram. We argue that such a scale invariant
strategy is sub-optimal since it ignores the multi-scale information
available with each bounding box of a person.
This paper investigates alternative approaches to scale coding for action recognition in still images. We encode multi-scale information explicitly in three different histograms for small, medium and large scale visual-words. Our first approach exploits multi-scale information with respect to the image size. In our second approach, we encode multi-scale information relative to the size of the bounding box of a person instance. In each approach, the multi-scale histograms are then concatenated into a single representation for action classification. We validate our approaches on the Willow dataset which contains seven action categories: interacting with computer, photography, playing music,
riding bike, riding horse, running and walking. Our results clearly suggest that the proposed scale coding approaches outperform the conventional scale invariant technique. Moreover, we show that our approach obtains promising results compared to more complex state-of-the-art methods.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). MSER-based Real-Time Text Detection and Tracking. In 22nd International Conference on Pattern Recognition (pp. 3110–3115).
Abstract: We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization.
A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2014). Cost-sensitive Structured SVM for Multi-category Domain Adaptation. In 22nd International Conference on Pattern Recognition (pp. 3886–3891). IEEE.
Abstract: Domain adaptation addresses the problem of accuracy drop that a classifier may suffer when the training data (source domain) and the testing data (target domain) are drawn from different distributions. In this work, we focus on domain adaptation for structured SVM (SSVM). We propose a cost-sensitive domain adaptation method for SSVM, namely COSS-SSVM. In particular, during the re-training of an adapted classifier based on target and source data, the idea that we explore consists in introducing a non-zero cost even for correctly classified source domain samples. Eventually, we aim to learn a more targetoriented classifier by not rewarding (zero loss) properly classified source-domain training samples. We assess the effectiveness of COSS-SSVM on multi-category object recognition.
Keywords: Domain Adaptation; Pedestrian Detection
|
|
|
Mohammad Ali Bagheri, Gang Hu, Qigang Gao, & Sergio Escalera. (2014). A Framework of Multi-Classifier Fusion for Human Action Recognition. In 22nd International Conference on Pattern Recognition (pp. 1260–1265).
Abstract: The performance of different action-recognition methods using skeleton joint locations have been recently studied by several computer vision researchers. However, the potential improvement in classification through classifier fusion by ensemble-based methods has remained unattended. In this work, we evaluate the performance of an ensemble of five action learning techniques, each performing the recognition task from a different perspective. The underlying rationale of the fusion approach is that different learners employ varying structures of input descriptors/features to be trained. These varying structures cannot be attached and used by a single learner. In addition, combining the outputs of several learners can reduce the risk of an unfortunate selection of a poorly performing learner. This leads to having a more robust and general-applicable framework. Also, we propose two simple, yet effective, action description techniques. In order to improve the recognition performance, a powerful combination strategy is utilized based on the Dempster-Shafer theory, which can effectively make use of diversity of base learners trained on different sources of information. The recognition results of the individual classifiers are compared with those obtained from fusing the classifiers' output, showing advanced performance of the proposed methodology.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2014). Embedding Document Structure to Bag-of-Words through Pair-wise Stable Key-regions. In 22nd International Conference on Pattern Recognition (pp. 2903–2908).
Abstract: Since the document structure carries valuable discriminative information, plenty of efforts have been made for extracting and understanding document structure among which layout analysis approaches are the most commonly used. In this paper, Distance Transform based MSER (DTMSER) is employed to efficiently extract the document structure as a dendrogram of key-regions which roughly correspond to structural elements such as characters, words and paragraphs. Inspired by the Bag
of Words (BoW) framework, we propose an efficient method for structural document matching by representing the document image as a histogram of key-region pairs encoding structural relationships.
Applied to the scenario of document image retrieval, experimental results demonstrate a remarkable improvement when comparing the proposed method with typical BoW and pyramidal BoW methods.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance. In 22nd International Conference on Pattern Recognition (pp. 3074–3079).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task, especially historical ones. In the paper, we propose a coarse-to-fine handwritten word spotting approach based on graph representation. The presented model comprises both the topological and morphological signatures of the handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. Aiming at developing a practical and efficient word spotting approach for large-scale historical handwritten documents, a fast and coarse comparison is first applied to prune the regions that are not similar to the query based on the graph embedding methodology. Afterwards, the query and regions of interest are compared by graph edit distance based on the Dynamic Time Warping alignment. The proposed approach is evaluated on a public dataset containing 50 pages of historical marriage license records. The results show that the proposed approach achieves a compromise between efficiency and accuracy.
Keywords: word spotting; coarse-to-fine mechamism; graphbased representation; graph embedding; graph edit distance
|
|
|
Claudio Baecchi, Francesco Turchini, Lorenzo Seidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Fisher vectors over random density forest for object recognition. In 22nd International Conference on Pattern Recognition (pp. 4328–4333).
|
|
|
Federico Bartoli, Giuseppe Lisanti, Svebor Karaman, Andrew Bagdanov, & Alberto del Bimbo. (2014). Unsupervised scene adaptation for faster multi- scale pedestrian detection. In 22nd International Conference on Pattern Recognition (pp. 3534–3539).
|
|
|
Enric Marti, Antoni Gurgui, Debora Gil, Aura Hernandez-Sabate, Jaume Rocarias, & Ferran Poveda. (2014). ABP on line: Seguimiento, estregas y evaluación en aprendizaje basado en proyectos.
|
|