|
Volkmar Frinken, Francisco Zamora, Salvador España, Maria Jose Castro, Andreas Fischer, & Horst Bunke. (2012). Long-Short Term Memory Neural Networks Language Modeling for Handwriting Recognition. In 21st International Conference on Pattern Recognition (pp. 701–704).
Abstract: Unconstrained handwritten text recognition systems maximize the combination of two separate probability scores. The first one is the observation probability that indicates how well the returned word sequence matches the input image. The second score is the probability that reflects how likely a word sequence is according to a language model. Current state-of-the-art recognition systems use statistical language models in form of bigram word probabilities. This paper proposes to model the target language by means of a recurrent neural network with long-short term memory cells. Because the network is recurrent, the considered context is not limited to a fixed size especially as the memory cells are designed to deal with long-term dependencies. In a set of experiments conducted on the IAM off-line database we show the superiority of the proposed language model over statistical n-gram models.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2012). Multipage Document Retrieval by Textual and Visual Representations. In 21st International Conference on Pattern Recognition (pp. 521–524).
Abstract: In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
|
|
|
Marçal Rusiñol, & Josep Llados. (2012). The Role of the Users in Handwritten Word Spotting Applications: Query Fusion and Relevance Feedback. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 55–60).
Abstract: In this paper we present the importance of including the user in the loop in a handwritten word spotting framework. Several off-the-shelf query fusion and relevance feedback strategies have been tested in the handwritten word spotting context. The increase in terms of precision when the user is included in the loop is assessed using two datasets of historical handwritten documents and a baseline word spotting approach based on a bag-of-visual-words model.
|
|
|
Volkmar Frinken, Markus Baumgartner, Andreas Fischer, & Horst Bunke. (2012). Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 49–54).
Abstract: State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
|
|
|
Emanuel Indermühle, Volkmar Frinken, & Horst Bunke. (2012). Mode Detection in Online Handwritten Documents using BLSTM Neural Networks. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 302–307).
Abstract: Mode detection in online handwritten documents refers to the process of distinguishing different types of contents, such as text, formulas, diagrams, or tables, one from another. In this paper a new approach to mode detection is proposed that uses bidirectional long-short term memory (BLSTM) neural networks. The BLSTM neural network is a novel type of recursive neural network that has been successfully applied in speech and handwriting recognition. In this paper we show that it has the potential to significantly outperform traditional methods for mode detection, which are usually based on stroke classification. As a further advantage over previous approaches, the proposed system is trainable and does not rely on user-defined heuristics. Moreover, it can be easily adapted to new or additional types of modes by just providing the system with new training data.
|
|
|
Volkmar Frinken, Alicia Fornes, Josep Llados, & Jean-Marc Ogier. (2012). Bidirectional Language Model for Handwriting Recognition. In Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop (Vol. 7626, pp. 611–619). LNCS. Springer Berlin Heidelberg.
Abstract: In order to improve the results of automatically recognized handwritten text, information about the language is commonly included in the recognition process. A common approach is to represent a text line as a sequence. It is processed in one direction and the language information via n-grams is directly included in the decoding. This approach, however, only uses context on one side to estimate a word’s probability. Therefore, we propose a bidirectional recognition in this paper, using distinct forward and a backward language models. By combining decoding hypotheses from both directions, we achieve a significant increase in recognition accuracy for the off-line writer independent handwriting recognition task. Both language models are of the same type and can be estimated on the same corpus. Hence, the increase in recognition accuracy comes without any additional need for training data or language modeling complexity.
|
|
|
Ekaterina Zaytseva, & Jordi Vitria. (2012). A search based approach to non maximum suppression in face detection. In 19th IEEE International Conference on Image Processing.
Abstract: Poster
paper TA.P5.12
Face detectors typically produce a large number of false positives and this leads to the need to have a further non maximum suppression stage to eliminate multiple and spurious responses. This stage is based on considering spatial heuristics: true positive responses are selected by implicitly considering several restrictions on the spatial distribution of detector responses in natural images. In this paper we analyze the limitations of this approach and propose an efficient search method to overcome them. Results show how the application of this new non-maximum suppression approach to a simple face detector boosts its performance to state of the art results.
|
|
|
Angel Sappa, David Geronimo, Fadi Dornaika, Mohammad Rouhani, & Antonio Lopez. (2012). Moving object detection from mobile platforms using stereo data registration. In Marek R. Ogiela, & Lakhmi C. Jain (Eds.), Computational Intelligence paradigms in advanced pattern classification (Vol. 386, pp. 25–37). Springer Berlin Heidelberg.
Abstract: This chapter describes a robust approach for detecting moving objects from on-board stereo vision systems. It relies on a feature point quaternion-based registration, which avoids common problems that appear when computationally expensive iterative-based algorithms are used on dynamic environments. The proposed approach consists of three main stages. Initially, feature points are extracted and tracked through consecutive 2D frames. Then, a RANSAC based approach is used for registering two point sets, with known correspondences in the 3D space. The computed 3D rigid displacement is used to map two consecutive 3D point clouds into the same coordinate system by means of the quaternion method. Finally, moving objects correspond to those areas with large 3D registration errors. Experimental results show the viability of the proposed approach to detect moving objects like vehicles or pedestrians in different urban scenarios.
Keywords: pedestrian detection
|
|
|
Sergio Escalera, Josep Moya, Laura Igual, Veronica Violant, & Maria Teresa Anguera. (2012). Análisis Comportamental Automatizado de TDAH: la Influencia de la Variable Motivación. In IPSI – Cosmocaixa, Jornadas "Empremtes del present, efectes en la psicoanàlisi, la cultura i la societat.
|
|
|
Laura Igual, Joan Carles Soliva, Antonio Hernandez, Sergio Escalera, Oscar Vilarroya, & Petia Radeva. (2012). A Supervised Graph-cut Deformable Model for Brain MRI Segmentation. Deformation models: tracking, animation and applications. In Computational Vision and Biomechanics. LNCS. Springer Netherlands.
|
|
|
Angel Sappa, & George A. Triantafyllid. (2012). Computer Graphics and Imaging.
|
|
|
Theo Gevers, Arjan Gijsenij, Joost Van de Weijer, & J.M. Geusebroek. (2012). Color in Computer Vision: Fundamentals and Applications. The Wiley-IS&T Series in Imaging Science and Technology.
|
|
|
Michal Drozdzal, Petia Radeva, Santiago Segui, Laura Igual, Carolina Malagelada, Fernando Azpiroz, et al. (2012). System and method for automatic detection of in vivo contraction video sequences.
Abstract: Publication date: 2012/3/8
|
|
|
Marçal Rusiñol, Lluis Pere de las Heras, Joan Mas, Oriol Ramos Terrades, Dimosthenis Karatzas, Anjan Dutta, et al. (2012). CVC-UAB's participation in the Flowchart Recognition Task of CLEF-IP 2012. In Conference and Labs of the Evaluation Forum.
|
|
|
Shida Beigpour. (2013). Illumination and object reflectance modeling (Joost Van de Weijer, & Ernest Valveny, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: More realistic and accurate models of the scene illumination and object reflectance can greatly improve the quality of many computer vision and computer graphics tasks. Using such model, a more profound knowledge about the interaction of light with object surfaces can be established which proves crucial to a variety of computer vision applications. In the current work, we investigate the various existing approaches to illumination and reflectance modeling and form an analysis on their shortcomings in capturing the complexity of real-world scenes. Based on this analysis we propose improvements to different aspects of reflectance and illumination estimation in order to more realistically model the real-world scenes in the presence of complex lighting phenomena (i.e, multiple illuminants, interreflections and shadows). Moreover, we captured our own multi-illuminant dataset which consists of complex scenes and illumination conditions both outdoor and in laboratory conditions. In addition we investigate the use of synthetic data to facilitate the construction of datasets and improve the process of obtaining ground-truth information.
|
|