Marçal Rusiñol, & Josep Llados. (2012). The Role of the Users in Handwritten Word Spotting Applications: Query Fusion and Relevance Feedback. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 55–60).
Abstract: In this paper we present the importance of including the user in the loop in a handwritten word spotting framework. Several off-the-shelf query fusion and relevance feedback strategies have been tested in the handwritten word spotting context. The increase in terms of precision when the user is included in the loop is assessed using two datasets of historical handwritten documents and a baseline word spotting approach based on a bag-of-visual-words model.
|
Volkmar Frinken, Markus Baumgartner, Andreas Fischer, & Horst Bunke. (2012). Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 49–54).
Abstract: State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
|
Emanuel Indermühle, Volkmar Frinken, & Horst Bunke. (2012). Mode Detection in Online Handwritten Documents using BLSTM Neural Networks. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 302–307).
Abstract: Mode detection in online handwritten documents refers to the process of distinguishing different types of contents, such as text, formulas, diagrams, or tables, one from another. In this paper a new approach to mode detection is proposed that uses bidirectional long-short term memory (BLSTM) neural networks. The BLSTM neural network is a novel type of recursive neural network that has been successfully applied in speech and handwriting recognition. In this paper we show that it has the potential to significantly outperform traditional methods for mode detection, which are usually based on stroke classification. As a further advantage over previous approaches, the proposed system is trainable and does not rely on user-defined heuristics. Moreover, it can be easily adapted to new or additional types of modes by just providing the system with new training data.
|
David Fernandez, Josep Llados, Alicia Fornes, & R.Manmatha. (2012). On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 763–768).
Abstract: he objective of this work is to show the importance of a good line segmentation to obtain better results in the segmentation of words of historical documents. We have used the approach developed by Manmatha and Rothfeder [1] to segment words in old handwritten documents. In their work the lines of the documents are extracted using projections. In this work, we have developed an approach to segment lines more efficiently. The new line segmentation algorithm tackles with skewed, touching and noisy lines, so it is significantly improves word segmentation. Experiments using Spanish documents from the Marriages Database of the Barcelona Cathedral show that this approach reduces the error rate by more than 20%
Keywords: document image processing;handwritten character recognition;history;image segmentation;Spanish document;historical document;line segmentation;old handwritten document;old manuscript;word segmentation;Bifurcation;Dynamic programming;Handwriting recognition;Image segmentation;Measurement;Noise;Skeleton;Segmentation;document analysis;document and text processing;handwriting analysis;heuristics;path-finding
|
Laura Igual, Joan Carles Soliva, Antonio Hernandez, Sergio Escalera, Oscar Vilarroya, & Petia Radeva. (2012). Supervised Brain Segmentation and Classification in Diagnostic of Attention-Deficit/Hyperactivity Disorder. In High Performance Computing and Simulation, International Conference on (pp. 182–187). IEEE Xplore.
Abstract: This paper presents an automatic method for external and internal segmentation of the caudate nucleus in Magnetic Resonance Images (MRI) based on statistical and structural machine learning approaches. This method is applied in Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis. The external segmentation method adapts the Graph Cut energy-minimization model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus. In particular, new energy function data and boundary potentials are defined and a supervised energy term based on contextual brain structures is added. Furthermore, the internal segmentation method learns a classifier based on shape features of the Region of Interest (ROI) in MRI slices. The results show accurate external and internal caudate segmentation in a real data set and similar performance of ADHD diagnostic test to manual annotation.
|
Petia Radeva, Michal Drozdzal, Santiago Segui, Laura Igual, Carolina Malagelada, Fernando Azpiroz, et al. (2012). Active labeling: Application to wireless endoscopy analysis. In High Performance Computing and Simulation, International Conference on (pp. 174–181).
Abstract: Today, robust learners trained in a real supervised machine learning application should count with a rich collection of positive and negative examples. Although in many applications, it is not difficult to obtain huge amount of data, labeling those data can be a very expensive process, especially when dealing with data of high variability and complexity. A good example of such cases are data from medical imaging applications where annotating anomalies like tumors, polyps, atherosclerotic plaque or informative frames in wireless endoscopy need highly trained experts. Building a representative set of training data from medical videos (e.g. Wireless Capsule Endoscopy) means that thousands of frames to be labeled by an expert. It is quite normal that data in new videos come different and thus are not represented by the training set. In this paper, we review the main approaches on active learning and illustrate how active learning can help to reduce expert effort in constructing the training sets. We show that applying active learning criteria, the number of human interventions can be significantly reduced. The proposed system allows the annotation of informative/non-informative frames of Wireless Capsule Endoscopy video containing more than 30000 frames each one with less than 100 expert ”clicks”.
|
Ekaterina Zaytseva, & Jordi Vitria. (2012). A search based approach to non maximum suppression in face detection. In 19th IEEE International Conference on Image Processing.
Abstract: Poster
paper TA.P5.12
Face detectors typically produce a large number of false positives and this leads to the need to have a further non maximum suppression stage to eliminate multiple and spurious responses. This stage is based on considering spatial heuristics: true positive responses are selected by implicitly considering several restrictions on the spatial distribution of detector responses in natural images. In this paper we analyze the limitations of this approach and propose an efficient search method to overcome them. Results show how the application of this new non-maximum suppression approach to a simple face detector boosts its performance to state of the art results.
|
Jiaolong Xu, David Vazquez, Antonio Lopez, Javier Marin, & Daniel Ponsa. (2013). Learning a Multiview Part-based Model in Virtual World for Pedestrian Detection. In IEEE Intelligent Vehicles Symposium (pp. 467–472). IEEE.
Abstract: State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).
Keywords: Pedestrian Detection; Virtual World; Part based
|
Jiaolong Xu, David Vazquez, Antonio Lopez, Javier Marin, & Daniel Ponsa. (2014). Learning a Part-based Pedestrian Detector in Virtual World. TITS - IEEE Transactions on Intelligent Transportation Systems, 15(5), 2121–2131.
Abstract: Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification module, which aims at deciding if a given image window contains a pedestrian. Given the difficulty of this task, many classifiers have been proposed during the last fifteen years. Among them, the so-called (deformable) part-based classifiers including multi-view modeling are usually top ranked in accuracy. Training such classifiers is not trivial since a proper aspect clustering and spatial part alignment of the pedestrian training samples are crucial for obtaining an accurate classifier. In this paper, first we perform automatic aspect clustering and part alignment by using virtual-world pedestrians, i.e., human annotations are not required. Second, we use a mixture-of-parts approach that allows part sharing among different aspects. Third, these proposals are integrated in a learning framework which also allows to incorporate real-world training data to perform domain adaptation between virtual- and real-world cameras. Overall, the obtained results on four popular on-board datasets show that our proposal clearly outperforms the state-of-the-art deformable part-based detector known as latent SVM.
Keywords: Domain Adaptation; Pedestrian Detection; Virtual Worlds
|
German Ros, J. Guerrero, Angel Sappa, & Antonio Lopez. (2013). VSLAM pose initialization via Lie groups and Lie algebras optimization. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 5740–5747).
Abstract: We present a novel technique for estimating initial 3D poses in the context of localization and Visual SLAM problems. The presented approach can deal with noise, outliers and a large amount of input data and still performs in real time in a standard CPU. Our method produces solutions with an accuracy comparable to those produced by RANSAC but can be much faster when the percentage of outliers is high or for large amounts of input data. On the current work we propose to formulate the pose estimation as an optimization problem on Lie groups, considering their manifold structure as well as their associated Lie algebras. This allows us to perform a fast and simple optimization at the same time that conserve all the constraints imposed by the Lie group SE(3). Additionally, we present several key design concepts related with the cost function and its Jacobian; aspects that are critical for the good performance of the algorithm.
Keywords: SLAM
|
Javier Marin, David Geronimo, David Vazquez, & Antonio Lopez. (2012). Pedestrian Detection: Exploring Virtual Worlds. In Handbook of Pattern Recognition: Methods and Application (Vol. 5, pp. 145–162). iConcept Press.
Abstract: Handbook of pattern recognition will include contributions from university educators and active research experts. This Handbook is intended to serve as a basic reference on methods and applications of pattern recognition. The primary aim of this handbook is providing the community of pattern recognition with a readable, easy to understand resource that covers introductory, intermediate and advanced topics with equal clarity. Therefore, the Handbook of pattern recognition can serve equally well as reference resource and as classroom textbook. Contributions cover all methods, techniques and applications of pattern recognition. A tentative list of relevant topics might include: 1- Statistical, structural, syntactic pattern recognition. 2- Neural networks, machine learning, data mining. 3- Discrete geometry, algebraic, graph-based techniques for pattern recognition. 4- Face recognition, Signal analysis, image coding and processing, shape and texture analysis. 5- Document processing, text and graphics recognition, digital libraries. 6- Speech recognition, music analysis, multimedia systems. 7- Natural language analysis, information retrieval. 8- Biometrics, biomedical pattern analysis and information systems. 9- Other scientific, engineering, social and economical applications of pattern recognition. 10- Special hardware architectures, software packages for pattern recognition.
Keywords: Virtual worlds; Pedestrian Detection; Domain Adaptation
|
David Roche, Debora Gil, & Jesus Giraldo. (2013). Detecting loss of diversity for an efficient termination of EAs. In 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (pp. 561–566).
Abstract: Termination of Evolutionary Algorithms (EA) at its steady state so that useless iterations are not performed is a main point for its efficient application to black-box problems. Many EA algorithms evolve while there is still diversity in their population and, thus, they could be terminated by analyzing the behavior some measures of EA population diversity. This paper presents a numeric approximation to steady states that can be used to detect the moment EA population has lost its diversity for EA termination. Our condition has been applied to 3 EA paradigms based on diversity and a selection of functions
covering the properties most relevant for EA convergence.
Experiments show that our condition works regardless of the search space dimension and function landscape.
Keywords: EA termination; EA population diversity; EA steady state
|
David Fernandez, R.Manmatha, Josep Llados, & Alicia Fornes. (2014). Sequential Word Spotting in Historical Handwritten Documents. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 101–105).
Abstract: In this work we present a handwritten word spotting approach that takes advantage of the a priori known order of appearance of the query words. Given an ordered sequence of query word instances, the proposed approach performs a
sequence alignment with the words in the target collection. Although the alignment is quite sparse, i.e. the number of words in the database is higher than the query set, the improvement in the overall performance is sensitively higher than isolated word spotting. As application dataset, we use a collection of handwritten marriage licenses taking advantage of the ordered
index pages of family names.
|
Christophe Rigaud, Dimosthenis Karatzas, Jean-Christophe Burie, & Jean-Marc Ogier. (2014). Color descriptor for content-based drawing retrieval. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 267–271).
Abstract: Human detection in computer vision field is an active field of research. Extending this to human-like drawings such as the main characters in comic book stories is not trivial. Comics analysis is a very recent field of research at the intersection of graphics, texts, objects and people recognition. The detection of the main comic characters is an essential step towards a fully automatic comic book understanding. This paper presents a color-based approach for comics character retrieval using content-based drawing retrieval and color palette.
|
Dimosthenis Karatzas, Sergi Robles, & Lluis Gomez. (2014). An on-line platform for ground truthing and performance evaluation of text extraction systems. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 242–246).
Abstract: This paper presents a set of on-line software tools for creating ground truth and calculating performance evaluation metrics for text extraction tasks such as localization, segmentation and recognition. The platform supports the definition of comprehensive ground truth information at different text representation levels while it offers centralised management and quality control of the ground truthing effort. It implements a range of state of the art performance evaluation algorithms and offers functionality for the definition of evaluation scenarios, on-line calculation of various performance metrics and visualisation of the results. The
presented platform, which comprises the backbone of the ICDAR 2011 (challenge 1) and 2013 (challenges 1 and 2) Robust Reading competitions, is now made available for public use.
|