|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2012). Three-Dimensional Design of Error Correcting Output Codes. In 8th International Conference on Machine Learning and Data Mining (pp. 29–).
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2012). Error Correcting Output Codes for multiclass classification: Application to two image vision problems. In 16th symposium on Artificial Intelligence & Signal Processing (pp. 508–513). IEEE Xplore.
Abstract: Error-correcting output codes (ECOC) represents a powerful framework to deal with multiclass classification problems based on combining binary classifiers. The key factor affecting the performance of ECOC methods is the independence of binary classifiers, without which the ECOC method would be ineffective. In spite of its ability on classification of problems with relatively large number of classes, it has been applied in few real world problems. In this paper, we investigate the behavior of the ECOC approach on two image vision problems: logo recognition and shape classification using Decision Tree and AdaBoost as the base learners. The results show that the ECOC method can be used to improve the classification performance in comparison with the classical multiclass approaches.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2012). Efficient pairwise classification using Local Cross Off strategy. In 25th Canadian Conference on Artificial Intelligence (Vol. 7310, pp. 25–36). LNCS.
Abstract: The pairwise classification approach tends to perform better than other well-known approaches when dealing with multiclass classification problems. In the pairwise approach, however, the nuisance votes of many irrelevant classifiers may result in a wrong prediction class. To overcome this problem, a novel method, Local Crossing Off (LCO), is presented and evaluated in this paper. The proposed LCO system takes advantage of nearest neighbor classification algorithm because of its simplicity and speed, as well as the strength of other two powerful binary classifiers to discriminate between two classes. This paper provides a set of experimental results on 20 datasets using two base learners: Neural Networks and Support Vector Machines. The results show that the proposed technique not only achieves better classification accuracy, but also is computationally more efficient for tackling classification problems which have a relatively large number of target classes.
|
|
|
Ekaterina Zaytseva, Santiago Segui, & Jordi Vitria. (2012). Sketchable Histograms of Oriented Gradients for Object Detection. In 17th Iberomerican Conference on Pattern Recognition (Vol. 7441, pp. 374–381). Springer Berlin Heidelberg.
Abstract: In this paper we investigate a new representation approach for visual object recognition. The new representation, called sketchable-HoG, extends the classical histogram of oriented gradients (HoG) feature by adding two different aspects: the stability of the majority orientation and the continuity of gradient orientations. In this way, the sketchable-HoG locally characterizes the complexity of an object model and introduces global structure information while still keeping simplicity, compactness and robustness. We evaluated the proposed image descriptor on publicly Catltech 101 dataset. The obtained results outperforms classical HoG descriptor as well as other reported descriptors in the literature.
|
|
|
Albert Gordo, Florent Perronnin, & Ernest Valveny. (2012). Document classification using multiple views. In 10th IAPR International Workshop on Document Analysis Systems (pp. 33–37). IEEE Computer Society Washington.
Abstract: The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at training and test time. However, some views may be too `expensive' to be available at test time. In this paper, we consider the use of Canonical Correlation Analysis to leverage `expensive' views that are available only at training time. Experimental results show that this information may significantly improve the results in a classification task.
|
|
|
Albert Gordo, Jose Antonio Rodriguez, Florent Perronnin, & Ernest Valveny. (2012). Leveraging category-level labels for instance-level image retrieval. In 25th IEEE Conference on Computer Vision and Pattern Recognition (pp. 3045–3052). IEEE Xplore.
Abstract: In this article, we focus on the problem of large-scale instance-level image retrieval. For efficiency reasons, it is common to represent an image by a fixed-length descriptor which is subsequently encoded into a small number of bits. We note that most encoding techniques include an unsupervised dimensionality reduction step. Our goal in this work is to learn a better subspace in a supervised manner. We especially raise the following question: “can category-level labels be used to learn such a subspace?” To answer this question, we experiment with four learning techniques: the first one is based on a metric learning framework, the second one on attribute representations, the third one on Canonical Correlation Analysis (CCA) and the fourth one on Joint Subspace and Classifier Learning (JSCL). While the first three approaches have been applied in the past to the image retrieval problem, we believe we are the first to show the usefulness of JSCL in this context. In our experiments, we use ImageNet as a source of category-level labels and report retrieval results on two standard dataseis: INRIA Holidays and the University of Kentucky benchmark. Our experimental study shows that metric learning and attributes do not lead to any significant improvement in retrieval accuracy, as opposed to CCA and JSCL. As an example, we report on Holidays an increase in accuracy from 39.3% to 48.6% with 32-dimensional representations. Overall JSCL is shown to yield the best results.
|
|
|
Francisco Cruz, & Oriol Ramos Terrades. (2012). Document segmentation using relative location features. In 21st International Conference on Pattern Recognition (pp. 1562–1565).
Abstract: In this paper we evaluate the use of Relative Location Features (RLF) on a historical document segmentation task, and compare the quality of the results obtained on structured and unstructured documents using RLF and not using them. We prove that using these features improve the final segmentation on documents with a strong structure, while their application on unstructured documents does not show significant improvement. Although this paper is not focused on segmenting unstructured documents, results obtained on a benchmark dataset are equal or even overcome previous results of similar works.
|
|
|
Volkmar Frinken, Francisco Zamora, Salvador España, Maria Jose Castro, Andreas Fischer, & Horst Bunke. (2012). Long-Short Term Memory Neural Networks Language Modeling for Handwriting Recognition. In 21st International Conference on Pattern Recognition (pp. 701–704).
Abstract: Unconstrained handwritten text recognition systems maximize the combination of two separate probability scores. The first one is the observation probability that indicates how well the returned word sequence matches the input image. The second score is the probability that reflects how likely a word sequence is according to a language model. Current state-of-the-art recognition systems use statistical language models in form of bigram word probabilities. This paper proposes to model the target language by means of a recurrent neural network with long-short term memory cells. Because the network is recurrent, the considered context is not limited to a fixed size especially as the memory cells are designed to deal with long-term dependencies. In a set of experiments conducted on the IAM off-line database we show the superiority of the proposed language model over statistical n-gram models.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2012). Multipage Document Retrieval by Textual and Visual Representations. In 21st International Conference on Pattern Recognition (pp. 521–524).
Abstract: In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
|
|
|
Marçal Rusiñol, & Josep Llados. (2012). The Role of the Users in Handwritten Word Spotting Applications: Query Fusion and Relevance Feedback. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 55–60).
Abstract: In this paper we present the importance of including the user in the loop in a handwritten word spotting framework. Several off-the-shelf query fusion and relevance feedback strategies have been tested in the handwritten word spotting context. The increase in terms of precision when the user is included in the loop is assessed using two datasets of historical handwritten documents and a baseline word spotting approach based on a bag-of-visual-words model.
|
|
|
Volkmar Frinken, Markus Baumgartner, Andreas Fischer, & Horst Bunke. (2012). Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 49–54).
Abstract: State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
|
|
|
Emanuel Indermühle, Volkmar Frinken, & Horst Bunke. (2012). Mode Detection in Online Handwritten Documents using BLSTM Neural Networks. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 302–307).
Abstract: Mode detection in online handwritten documents refers to the process of distinguishing different types of contents, such as text, formulas, diagrams, or tables, one from another. In this paper a new approach to mode detection is proposed that uses bidirectional long-short term memory (BLSTM) neural networks. The BLSTM neural network is a novel type of recursive neural network that has been successfully applied in speech and handwriting recognition. In this paper we show that it has the potential to significantly outperform traditional methods for mode detection, which are usually based on stroke classification. As a further advantage over previous approaches, the proposed system is trainable and does not rely on user-defined heuristics. Moreover, it can be easily adapted to new or additional types of modes by just providing the system with new training data.
|
|
|
Volkmar Frinken, Alicia Fornes, Josep Llados, & Jean-Marc Ogier. (2012). Bidirectional Language Model for Handwriting Recognition. In Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop (Vol. 7626, pp. 611–619). LNCS. Springer Berlin Heidelberg.
Abstract: In order to improve the results of automatically recognized handwritten text, information about the language is commonly included in the recognition process. A common approach is to represent a text line as a sequence. It is processed in one direction and the language information via n-grams is directly included in the decoding. This approach, however, only uses context on one side to estimate a word’s probability. Therefore, we propose a bidirectional recognition in this paper, using distinct forward and a backward language models. By combining decoding hypotheses from both directions, we achieve a significant increase in recognition accuracy for the off-line writer independent handwriting recognition task. Both language models are of the same type and can be estimated on the same corpus. Hence, the increase in recognition accuracy comes without any additional need for training data or language modeling complexity.
|
|
|
Laura Igual, Joan Carles Soliva, Roger Gimeno, Sergio Escalera, Oscar Vilarroya, & Petia Radeva. (2012). Automatic Internal Segmentation of Caudate Nucleus for Diagnosis of Attention Deficit Hyperactivity Disorder. In 9th International Conference on Image Analysis and Recognition (Vol. 7325, pp. 222–229). LNCS.
Abstract: Poster
Studies on volumetric brain Magnetic Resonance Imaging (MRI) showed neuroanatomical abnormalities in pediatric Attention-Deficit/Hyperactivity Disorder (ADHD). In particular, the diminished right caudate volume is one of the most replicated findings among ADHD samples in morphometric MRI studies. In this paper, we propose a fully-automatic method for internal caudate nucleus segmentation based on machine learning. Moreover, the ratio between right caudate body volume and the bilateral caudate body volume is applied in a ADHD diagnostic test. We separately validate the automatic internal segmentation of caudate in head and body structures and the diagnostic test using real data from ADHD and control subjects. As a result, we show accurate internal caudate segmentation and similar performance among the proposed automatic diagnostic test and the manual annotation.
|
|
|
Ekaterina Zaytseva, & Jordi Vitria. (2012). A search based approach to non maximum suppression in face detection. In 19th IEEE International Conference on Image Processing.
Abstract: Poster
paper TA.P5.12
Face detectors typically produce a large number of false positives and this leads to the need to have a further non maximum suppression stage to eliminate multiple and spurious responses. This stage is based on considering spatial heuristics: true positive responses are selected by implicitly considering several restrictions on the spatial distribution of detector responses in natural images. In this paper we analyze the limitations of this approach and propose an efficient search method to overcome them. Results show how the application of this new non-maximum suppression approach to a simple face detector boosts its performance to state of the art results.
|
|