|
Antonio Clavelli, Dimosthenis Karatzas, Josep Llados, Mario Ferraro, & Giuseppe Boccignone. (2013). Towards Modelling an Attention-Based Text Localization Process. In 6th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 7887, pp. 296–303). LNCS. Springer Berlin Heidelberg.
Abstract: This note introduces a visual attention model of text localization in real-world scenes. The core of the model built upon the proto-object concept is discussed. It is shown how such dynamic mid-level representation of the scene can be derived in the framework of an action-perception loop engaging salience, text information value computation, and eye guidance mechanisms.
Preliminary results that compare model generated scanpaths with those eye-tracked from human subjects are presented.
Keywords: text localization; visual attention; eye guidance
|
|
|
Nuria Cirera, Alicia Fornes, Volkmar Frinken, & Josep Llados. (2013). Hybrid grammar language model for handwritten historical documents recognition. In 6th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 7887, pp. 117–124). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a hybrid language model for the recognition of handwritten historical documents with a structured syntactical layout. Using a hidden Markov model-based recognition framework, a word-based grammar with a closed dictionary is enhanced by a character sequence recognition method. This allows to recognize out-of-dictionary words in controlled parts of the recognition, while keeping a closed vocabulary restriction for other parts. While the current status is work in progress, we can report an improvement in terms of character error rate.
|
|
|
Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2012). Towards Automatic Polyp Detection with a Polyp Appearance Model. PR - Pattern Recognition, 45(9), 3166–3182.
Abstract: This work aims at the automatic polyp detection by using a model of polyp appearance in the context of the analysis of colonoscopy videos. Our method consists of three stages: region segmentation, region description and region classification. The performance of our region segmentation method guarantees that if a polyp is present in the image, it will be exclusively and totally contained in a single region. The output of the algorithm also defines which regions can be considered as non-informative. We define as our region descriptor the novel Sector Accumulation-Depth of Valleys Accumulation (SA-DOVA), which provides a necessary but not sufficient condition for the polyp presence. Finally, we classify our segmented regions according to the maximal values of the SA-DOVA descriptor. Our preliminary classification results are promising, especially when classifying those parts of the image that do not contain a polyp inside.
Keywords: Colonoscopy,PolypDetection,RegionSegmentation,SA-DOVA descriptot
|
|
|
D.Sanchez, J.C.Ortega, & Miguel Angel Bautista. (2013). Human Body Segmentation with Multi-limb Error-Correcting Output Codes Detection and Graph Cuts Optimization. In 6th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 7887, pp. 50–58). LNCS. Springer Berlin Heidelberg.
Abstract: Human body segmentation is a hard task because of the high variability in appearance produced by changes in the point of view, lighting conditions, and number of articulations of the human body. In this paper, we propose a two-stage approach for the segmentation of the human body. In a first step, a set of human limbs are described, normalized to be rotation invariant, and trained using cascade of classifiers to be split in a tree structure way. Once the tree structure is trained, it is included in a ternary Error-Correcting Output Codes (ECOC) framework. This first classification step is applied in a windowing way on a new test image, defining a body-like probability map, which is used as an initialization of a GMM color modelling and binary Graph Cuts optimization procedure. The proposed methodology is tested in a novel limb-labelled data set. Results show performance improvements of the novel approach in comparison to classical cascade of classifiers and human detector-based Graph Cuts segmentation approaches.
Keywords: Human Body Segmentation; Error-Correcting Output Codes; Cascade of Classifiers; Graph Cuts
|
|
|
Alejandro Gonzalez Alzate, Sebastian Ramos, David Vazquez, Antonio Lopez, & Jaume Amores. (2015). Spatiotemporal Stacked Sequential Learning for Pedestrian Detection. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (pp. 3–12).
Abstract: Pedestrian classifiers decide which image windows contain a pedestrian. In practice, such classifiers provide a relatively high response at neighbor windows overlapping a pedestrian, while the responses around potential false positives are expected to be lower. An analogous reasoning applies for image sequences. If there is a pedestrian located within a frame, the same pedestrian is expected to appear close to the same location in neighbor frames. Therefore, such a location has chances of receiving high classification scores during several frames, while false positives are expected to be more spurious. In this paper we propose to exploit such correlations for improving the accuracy of base pedestrian classifiers. In particular, we propose to use two-stage classifiers which not only rely on the image descriptors required by the base classifiers but also on the response of such base classifiers in a given spatiotemporal neighborhood. More specifically, we train pedestrian classifiers using a stacked sequential learning (SSL) paradigm. We use a new pedestrian dataset we have acquired from a car to evaluate our proposal at different frame rates. We also test on a well known dataset: Caltech. The obtained results show that our SSL proposal boosts detection accuracy significantly with a minimal impact on the computational cost. Interestingly, SSL improves more the accuracy at the most dangerous situations, i.e. when a pedestrian is close to the camera.
Keywords: SSL; Pedestrian Detection
|
|
|
Alejandro Gonzalez Alzate, Gabriel Villalonga, German Ros, David Vazquez, & Antonio Lopez. (2015). 3D-Guided Multiscale Sliding Window for Pedestrian Detection. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 560–568).
Abstract: The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified as containing a pedestrian or not. Much attention has being paid to the classification module, while candidate generation has mainly relied on (multiscale) sliding window pyramid. However, candidate generation is critical for achieving real-time. In this paper we assume a context of autonomous driving based on stereo vision. Accordingly, we evaluate the effect of taking into account the 3D information (derived from the stereo) in order to prune the hundred of thousands windows per image generated by classical pyramidal sliding window. For our study we use a multimodal (RGB, disparity) and multi-descriptor (HOG, LBP, HOG+LBP) holistic ensemble based on linear SVM. Evaluation on data from the challenging KITTI benchmark suite shows the effectiveness of using 3D information to dramatically reduce the number of candidate windows, even improving the overall pedestrian detection accuracy.
Keywords: Pedestrian Detection
|
|
|
Marc Bolaños, Maite Garolera, & Petia Radeva. (2015). Object Discovery using CNN Features in Egocentric Videos. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 67–74). LNCS.
Abstract: Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.
Keywords: Object discovery; Egocentric videos; Lifelogging; CNN
|
|
|
Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, & Petia Radeva. (2015). R-clustering for egocentric video segmentation. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 327–336). LNCS. Springer International Publishing.
Abstract: In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Keywords: Temporal video segmentation; Egocentric videos; Clustering
|
|
|
Onur Ferhat, Arcadi Llanza, & Fernando Vilariño. (2015). A Feature-Based Gaze Estimation Algorithm for Natural Light Scenarios. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 569–576). LNCS. Springer International Publishing.
Abstract: We present an eye tracking system that works with regular webcams. We base our work on open source CVC Eye Tracker [7] and we propose a number of improvements and a novel gaze estimation method. The new method uses features extracted from iris segmentation and it does not fall into the traditional categorization of appearance–based/model–based methods. Our experiments show that our approach reduces the gaze estimation errors by 34 % in the horizontal direction and by 12 % in the vertical direction compared to the baseline system.
Keywords: Eye tracking; Gaze estimation; Natural light; Webcam
|
|
|
Suman Ghosh, & Ernest Valveny. (2015). A Sliding Window Framework for Word Spotting Based on Word Attributes. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 652–661). LNCS. Springer International Publishing.
Abstract: In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
Keywords: Word spotting; Sliding window; Word attributes
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2017). Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology. In L.A. Alexandre, J.Salvador Sanchez, & Joao M. F. Rodriguez (Eds.), 8th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 10255, pp. 287–294). LNCS.
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Keywords: Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
|
|
|
Marc Bolaños, Alvaro Peris, Francisco Casacuberta, & Petia Radeva. (2017). VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.
Keywords: Visual Qestion Aswering; Convolutional Neural Networks; Long short-term memory networks
|
|
|
Hana Jarraya, Oriol Ramos Terrades, & Josep Llados. (2017). Graph Embedding through Probabilistic Graphical Model applied to Symbolic Graphs. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: We propose a new Graph Embedding (GEM) method that takes advantages of structural pattern representation. It models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector. This vector is a signature of AG in a lower dimensional vectorial space. We apply Structured Support Vector Machines (SSVM) to process classification task. As first tentative, results on the GREC dataset are encouraging enough to go further on this direction.
Keywords: Attributed Graph; Probabilistic Graphical Model; Graph Embedding; Structured Support Vector Machines
|
|
|
Eduardo Aguilar, & Petia Radeva. (2019). Food Recognition by Integrating Local and Flat Classifiers. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11867, pp. 65–74). LNCS.
Abstract: The recognition of food image is an interesting research topic, in which its applicability in the creation of nutritional diaries stands out with the aim of improving the quality of life of people with a chronic disease (e.g. diabetes, heart disease) or prone to acquire it (e.g. people with overweight or obese). For a food recognition system to be useful in real applications, it is necessary to recognize a huge number of different foods. We argue that for very large scale classification, a traditional flat classifier is not enough to acquire an acceptable result. To address this, we propose a method that performs prediction with local classifiers, based on a class hierarchy, or with flat classifier. We decide which approach to use, depending on the analysis of both the Epistemic Uncertainty obtained for the image in the children classifiers and the prediction of the parent classifier. When our criterion is met, the final prediction is obtained with the respective local classifier; otherwise, with the flat classifier. From the results, we can see that the proposed method improves the classification performance compared to the use of a single flat classifier.
|
|
|
Parichehr Behjati Ardakani, Diego Velazquez, Josep M. Gonfaus, Pau Rodriguez, Xavier Roca, & Jordi Gonzalez. (2019). Catastrophic interference in Disguised Face Recognition. In 9th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 11868, pp. 64–75). LNCS.
Abstract: It is commonly known the natural tendency of artificial neural networks to completely and abruptly forget previously known information when learning new information. We explore this behaviour in the context of Face Verification on the recently proposed Disguised Faces in the Wild dataset (DFW). We empirically evaluate several commonly used DCNN architectures on Face Recognition and distill some insights about the effect of sequential learning on distinct identities from different datasets, showing that the catastrophic forgetness phenomenon is present even in feature embeddings fine-tuned on different tasks from the original domain.
Keywords: Neural network forgetness; Face recognition; Disguised Faces
|
|