|
Fahad Shahbaz Khan, Joost Van de Weijer, Andrew Bagdanov, & Michael Felsberg. (2014). Scale Coding Bag-of-Words for Action Recognition. In 22nd International Conference on Pattern Recognition (pp. 1514–1519).
Abstract: Recognizing human actions in still images is a challenging problem in computer vision due to significant amount of scale, illumination and pose variation. Given the bounding box of a person both at training and test time, the task is to classify the action associated with each bounding box in an image.
Most state-of-the-art methods use the bag-of-words paradigm for action recognition. The bag-of-words framework employing a dense multi-scale grid sampling strategy is the de facto standard for feature detection. This results in a scale invariant image representation where all the features at multiple-scales are binned in a single histogram. We argue that such a scale invariant
strategy is sub-optimal since it ignores the multi-scale information
available with each bounding box of a person.
This paper investigates alternative approaches to scale coding for action recognition in still images. We encode multi-scale information explicitly in three different histograms for small, medium and large scale visual-words. Our first approach exploits multi-scale information with respect to the image size. In our second approach, we encode multi-scale information relative to the size of the bounding box of a person instance. In each approach, the multi-scale histograms are then concatenated into a single representation for action classification. We validate our approaches on the Willow dataset which contains seven action categories: interacting with computer, photography, playing music,
riding bike, riding horse, running and walking. Our results clearly suggest that the proposed scale coding approaches outperform the conventional scale invariant technique. Moreover, we show that our approach obtains promising results compared to more complex state-of-the-art methods.
|
|
|
Alejandro Gonzalez Alzate, Sebastian Ramos, David Vazquez, Antonio Lopez, & Jaume Amores. (2015). Spatiotemporal Stacked Sequential Learning for Pedestrian Detection. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (pp. 3–12).
Abstract: Pedestrian classifiers decide which image windows contain a pedestrian. In practice, such classifiers provide a relatively high response at neighbor windows overlapping a pedestrian, while the responses around potential false positives are expected to be lower. An analogous reasoning applies for image sequences. If there is a pedestrian located within a frame, the same pedestrian is expected to appear close to the same location in neighbor frames. Therefore, such a location has chances of receiving high classification scores during several frames, while false positives are expected to be more spurious. In this paper we propose to exploit such correlations for improving the accuracy of base pedestrian classifiers. In particular, we propose to use two-stage classifiers which not only rely on the image descriptors required by the base classifiers but also on the response of such base classifiers in a given spatiotemporal neighborhood. More specifically, we train pedestrian classifiers using a stacked sequential learning (SSL) paradigm. We use a new pedestrian dataset we have acquired from a car to evaluate our proposal at different frame rates. We also test on a well known dataset: Caltech. The obtained results show that our SSL proposal boosts detection accuracy significantly with a minimal impact on the computational cost. Interestingly, SSL improves more the accuracy at the most dangerous situations, i.e. when a pedestrian is close to the camera.
Keywords: SSL; Pedestrian Detection
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2014). Incremental Domain Adaptation of Deformable Part-based Models. In Tony Andrew and Pridmore M. and F. Valstar (Ed.), 25th British Machine Vision Conference. BMVA Press.
Abstract: Nowadays, classifiers play a core role in many computer vision tasks. The underlying assumption for learning classifiers is that the training set and the deployment environment (testing) follow the same probability distribution regarding the features used by the classifiers. However, in practice, there are different reasons that can break this constancy assumption. Accordingly, reusing existing classifiers by adapting them from the previous training environment (source domain) to the new testing one (target domain)
is an approach with increasing acceptance in the computer vision community. In this paper we focus on the domain adaptation of deformable part-based models (DPMs) for object detection. In particular, we focus on a relatively unexplored scenario, i.e. incremental domain adaptation for object detection assuming weak-labeling. Therefore, our algorithm is ready to improve existing source-oriented DPM-based detectors as soon as a little amount of labeled target-domain training data is available, and keeps improving as more of such data arrives in a continuous fashion. For achieving this, we follow a multiple
instance learning (MIL) paradigm that operates in an incremental per-image basis. As proof of concept, we address the challenging scenario of adapting a DPM-based pedestrian detector trained with synthetic pedestrians to operate in real-world scenarios. The obtained results show that our incremental adaptive models obtain equally good accuracy results as the batch learned models, while being more flexible for handling continuously arriving target-domain data.
Keywords: Pedestrian Detection; Part-based models; Domain Adaptation
|
|
|
Enric Marti, Antoni Gurgui, Debora Gil, Aura Hernandez-Sabate, Jaume Rocarias, & Ferran Poveda. (2014). ABP on line: Seguimiento, estregas y evaluación en aprendizaje basado en proyectos.
|
|
|
Carles Sanchez, Oriol Ramos Terrades, Patricia Marquez, Enric Marti, Jaume Rocarias, & Debora Gil. (2014). Evaluación automática de prácticas en Moodle para el aprendizaje autónomo en Ingenierías.
|
|
|
David Fernandez, Pau Riba, Alicia Fornes, & Josep Llados. (2014). On the Influence of Key Point Encoding for Handwritten Word Spotting. In 14th International Conference on Frontiers in Handwriting Recognition (pp. 476–481).
Abstract: In this paper we evaluate the influence of the selection of key points and the associated features in the performance of word spotting processes. In general, features can be extracted from a number of characteristic points like corners, contours, skeletons, maxima, minima, crossings, etc. A number of descriptors exist in the literature using different interest point detectors. But the intrinsic variability of handwriting vary strongly on the performance if the interest points are not stable enough. In this paper, we analyze the performance of different descriptors for local interest points. As benchmarking dataset we have used the Barcelona Marriage Database that contains handwritten records of marriages over five centuries.
Keywords: Local descriptors; Interest points; Handwritten documents; Word spotting; Historical document analysis
|
|
|
David Fernandez, Jon Almazan, Nuria Cirera, Alicia Fornes, & Josep Llados. (2014). BH2M: the Barcelona Historical Handwritten Marriages database. In 22nd International Conference on Pattern Recognition (pp. 256–261).
Abstract: This paper presents an image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms. The contribution of this paper is twofold. First, it presents a complete ground truth which covers the whole pipeline of handwriting
recognition research, from layout analysis to recognition and understanding. Second, it is the first dataset in the emerging area of genealogical document analysis, where documents are manuscripts pseudo-structured with specific lexicons and the interest is beyond pure transcriptions but context dependent.
|
|
|
David Fernandez, R.Manmatha, Josep Llados, & Alicia Fornes. (2014). Sequential Word Spotting in Historical Handwritten Documents. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 101–105).
Abstract: In this work we present a handwritten word spotting approach that takes advantage of the a priori known order of appearance of the query words. Given an ordered sequence of query word instances, the proposed approach performs a
sequence alignment with the words in the target collection. Although the alignment is quite sparse, i.e. the number of words in the database is higher than the query set, the improvement in the overall performance is sensitively higher than isolated word spotting. As application dataset, we use a collection of handwritten marriage licenses taking advantage of the ordered
index pages of family names.
|
|
|
Pau Riba, Jon Almazan, Alicia Fornes, David Fernandez, Ernest Valveny, & Josep Llados. (2014). e-Crowds: a mobile platform for browsing and searching in historical demographyrelated manuscripts. In 14th International Conference on Frontiers in Handwriting Recognition (pp. 228–233).
Abstract: This paper presents a prototype system running on portable devices for browsing and word searching through historical handwritten document collections. The platform adapts the paradigm of eBook reading, where the narrative is not necessarily sequential, but centered on the user actions. The novelty is to replace digitally born books by digitized historical manuscripts of marriage licenses, so document analysis tasks are required in the browser. With an active reading paradigm, the user can cast queries of people names, so he/she can implicitly follow genealogical links. In addition, the system allows combined searches: the user can refine a search by adding more words to search. As a second contribution, the retrieval functionality involves as a core technology a word spotting module with an unified approach, which allows combined query searches, and also two input modalities: query-by-example, and query-by-string.
|
|
|
Monica Piñol. (2014). Reinforcement Learning of Visual Descriptors for Object Recognition (Ricardo Toledo, & Angel Sappa, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: The human visual system is able to recognize the object in an image even if the object is partially occluded, from various points of view, in different colors, or with independence of the distance to the object. To do this, the eye obtains an image and extracts features that are sent to the brain, and then, in the brain the object is recognized. In computer vision, the object recognition branch tries to learns from the human visual system behaviour to achieve its goal. Hence, an algorithm is used to identify representative features of the scene (detection), then another algorithm is used to describe these points (descriptor) and finally the extracted information is used for classifying the object in the scene. The selection of this set of algorithms is a very complicated task and thus, a very active research field. In this thesis we are focused on the selection/learning of the best descriptor for a given image. In the state of the art there are several descriptors but we do not know how to choose the best descriptor because depends on scenes that we will use (dataset) and the algorithm chosen to do the classification. We propose a framework based on reinforcement learning and bag of features to choose the best descriptor according to the given image. The system can analyse the behaviour of different learning algorithms and descriptor sets. Furthermore the proposed framework for improving the classification/recognition ratio can be used with minor changes in other computer vision fields, such as video retrieval.
|
|
|
Anjan Dutta. (2014). Inexact Subgraph Matching Applied to Symbol Spotting in Graphical Documents (Josep Llados, & Umapada Pal, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: There is a resurgence in the use of structural approaches in the usual object recognition and retrieval problem. Graph theory, in particular, graph matching plays a relevant role in that. Specifically, the detection of an object (or a part of that) in an image in terms of structural features can be formulated as a subgraph matching. Subgraph matching is a challenging task. Specially due to the presence of outliers most of the graph matching algorithms do not perform well in subgraph matching scenario. Also exact subgraph isomorphism has proven to be an NP-complete problem. So naturally, in graph matching community, there are lot of efforts addressing the problem of subgraph matching within suboptimal bound. Most of them work with approximate algorithms that try to get an inexact solution in estimated way. In addition, usual recognition must cope with distortion. Inexact graph matching consists in finding the best isomorphism under a similarity measure. Theoretically this thesis proposes algorithms for solving subgraph matching in an approximate and inexact way.
We consider the symbol spotting problem on graphical documents or line drawings from application point of view. This is a well known problem in the graphics recognition community. It can be further applied for indexing and classification of documents based on their contents. The structural nature of this kind of documents easily motivates one for giving a graph based representation. So the symbol spotting problem on graphical documents can be considered as a subgraph matching problem. The main challenges in this application domain is the noise and distortions that might come during the usage, digitalization and raster to vector conversion of those documents. Apart from that computer vision nowadays is not any more confined within a limited number of images. So dealing a huge number of images with graph based method is a further challenge.
In this thesis, on one hand, we have worked on efficient and robust graph representation to cope with the noise and distortions coming from documents. On the other hand, we have worked on different graph based methods and framework to solve the subgraph matching problem in a better approximated way, which can also deal with considerable number of images. Firstly, we propose a symbol spotting method by hashing serialized subgraphs. Graph serialization allows to create factorized substructures such as graph paths, which can be organized in hash tables depending on the structural similarities of the serialized subgraphs. The involvement of hashing techniques helps to reduce the search space substantially and speeds up the spotting procedure. Secondly, we introduce contextual similarities based on the walk based propagation on tensor product graph. These contextual similarities involve higher order information and more reliable than pairwise similarities. We use these higher order similarities to formulate subgraph matching as a node and edge selection problem in the tensor product graph. Thirdly, we propose near convex grouping to form near convex region adjacency graph which eliminates the limitations of traditional region adjacency graph representation for graphic recognition. Fourthly, we propose a hierarchical graph representation by simplifying/correcting the structural errors to create a hierarchical graph of the base graph. Later these hierarchical graph structures are matched with some graph matching methods. Apart from that, in this thesis we have provided an overall experimental comparison of all the methods and some of the state-of-the-art methods. Furthermore, some dataset models have also been proposed.
|
|
|
Adriana Romero, Petia Radeva, & Carlo Gatta. (2014). No more meta-parameter tuning in unsupervised sparse feature learning.
Abstract: CoRR abs/1402.5766
We propose a meta-parameter free, off-the-shelf, simple and fast unsupervised feature learning algorithm, which exploits a new way of optimizing for sparsity. Experiments on STL-10 show that the method presents state-of-the-art performance and provides discriminative features that generalize well.
|
|
|
Christophe Rigaud, Dimosthenis Karatzas, Jean-Christophe Burie, & Jean-Marc Ogier. (2014). Color descriptor for content-based drawing retrieval. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 267–271).
Abstract: Human detection in computer vision field is an active field of research. Extending this to human-like drawings such as the main characters in comic book stories is not trivial. Comics analysis is a very recent field of research at the intersection of graphics, texts, objects and people recognition. The detection of the main comic characters is an essential step towards a fully automatic comic book understanding. This paper presents a color-based approach for comics character retrieval using content-based drawing retrieval and color palette.
|
|
|
Clement Guerin, Christophe Rigaud, Karell Bertet, Jean-Christophe Burie, Arnaud Revel, & Jean-Marc Ogier. (2014). Réduction de l’espace de recherche pour les personnages de bandes dessinées. In 19th National Congress Reconnaissance de Formes et l'Intelligence Artificielle.
Abstract: Les bandes dessinées représentent un patrimoine culturel important dans de nombreux pays et leur numérisation massive offre la possibilité d'effectuer des recherches dans le contenu des images. À ce jour, ce sont principalement les structures des pages et leurs contenus textuels qui ont été étudiés, peu de travaux portent sur le contenu graphique. Nous proposons de nous appuyer sur des éléments déjà étudiés tels que la position des cases et des bulles, pour réduire l'espace de recherche et localiser les personnages en fonction de la queue des bulles. L'évaluation de nos différentes contributions à partir de la base eBDtheque montre un taux de détection des queues de bulle de 81.2%, de localisation des personnages allant jusqu'à 85% et un gain d'espace de recherche de plus de 50%.
Keywords: contextual search; document analysis; comics characters
|
|
|
Christophe Rigaud, & Clement Guerin. (2014). Localisation contextuelle des personnages de bandes dessinées. In Colloque International Francophone sur l'Écrit et le Document.
Abstract: Les auteurs proposent une méthode de localisation des personnages dans des cases de bandes dessinées en s'appuyant sur les caractéristiques des bulles de dialogue. L'évaluation montre un taux de localisation des personnages allant jusqu'à 65%.
|
|