|
Debora Gil, & Petia Radeva. (2006). Inhibition of false landmarks. PRL - Pattern Recognition Letters, 27(9), 1022–1030.
Abstract: Corners and junctions are landmarks characterized by the lack of differentiability in the unit tangent to the image level curve. Detectors based on differential operators are not, by their own definition, the best posed as they require a higher degree of differentiability to yield a reliable response. We argue that a corner detector should be based on the degree of continuity of the tangent vector to the image level sets, work on the image domain and need no assumptions on neither the image local structure nor the particular geometry of the corner/junction. An operator measuring the degree of differentiability of the projection matrix on the image gradient fulfills the above requirements. Because using smoothing kernels leads to corner misplacement, we suggest an alternative fake response remover based on the receptive field inhibition of spurious details. The combination of both orientation discontinuity detection and noise inhibition produce our inhibition orientation energy (IOE) landmark locator.
|
|
|
Oriol Ramos Terrades, & Ernest Valveny. (2006). A new use of the ridgelets transform for describing linear singularities in images. PRL - Pattern Recognition Letters, 27(6), 587–596.
|
|
|
Jaume Amores, N. Sebe, & Petia Radeva. (2006). Boosting the distance estimation: Application to the K-Nearest Neighbor Classifier. PRL - Pattern Recognition Letters, 27(3), 201–209.
|
|
|
Cristina Cañero, & Petia Radeva. (2003). Vesselness enhancement diffusion. PRL - Pattern Recognition Letters, 24(16), 3141–3151.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). Re-coding ECOCs without retraining. PRL - Pattern Recognition Letters, 31(7), 555–562.
Abstract: A standard way to deal with multi-class categorization problems is by the combination of binary classifiers in a pairwise voting procedure. Recently, this classical approach has been formalized in the Error-Correcting Output Codes (ECOC) framework. In the ECOC framework, the one-versus-one coding demonstrates to achieve higher performance than the rest of coding designs. The binary problems that we train in the one-versus-one strategy are significantly smaller than in the rest of designs, and usually easier to be learnt, taking into account the smaller overlapping between classes. However, a high percentage of the positions coded by zero of the coding matrix, which implies a high sparseness degree, does not codify meta-class membership information. In this paper, we show that using the training data we can redefine without re-training, in a problem-dependent way, the one-versus-one coding matrix so that the new coded information helps the system to increase its generalization capability. Moreover, the new re-coding strategy is generalized to be applied over any binary code. The results over several UCI Machine Learning repository data sets and two real multi-class problems show that performance improvements can be obtained re-coding the classical one-versus-one and Sparse random designs compared to different state-of-the-art ECOC configurations.
|
|
|
Jose Antonio Rodriguez, Florent Perronnin, Gemma Sanchez, & Josep Llados. (2010). Unsupervised writer adaptation of whole-word HMMs with application to word-spotting. PRL - Pattern Recognition Letters, 31(8), 742–749.
Abstract: In this paper we propose a novel approach for writer adaptation in a handwritten word-spotting task. The method exploits the fact that the semi-continuous hidden Markov model separates the word model parameters into (i) a codebook of shapes and (ii) a set of word-specific parameters.
Our main contribution is to employ this property to derive writer-specific word models by statistically adapting an initial universal codebook to each document. This process is unsupervised and does not even require the appearance of the keyword(s) in the searched document. Experimental results show an increase in performance when this adaptation technique is applied. To the best of our knowledge, this is the first work dealing with adaptation for word-spotting. The preliminary version of this paper obtained an IBM Best Student Paper Award at the 19th International Conference on Pattern Recognition.
Keywords: Word-spotting; Handwriting recognition; Writer adaptation; Hidden Markov model; Document analysis
|
|
|
Fernando Barrera, Felipe Lumbreras, & Angel Sappa. (2013). Multispectral Piecewise Planar Stereo using Manhattan-World Assumption. PRL - Pattern Recognition Letters, 34(1), 52–61.
Abstract: This paper proposes a new framework for extracting dense disparity maps from a multispectral stereo rig. The system is constructed with an infrared and a color camera. It is intended to explore novel multispectral stereo matching approaches that will allow further extraction of semantic information. The proposed framework consists of three stages. Firstly, an initial sparse disparity map is generated by using a cost function based on feature matching in a multiresolution scheme. Then, by looking at the color image, a set of planar hypotheses is defined to describe the surfaces on the scene. Finally, the previous stages are combined by reformulating the disparity computation as a global minimization problem. The paper has two main contributions. The first contribution combines mutual information with a shape descriptor based on gradient in a multiresolution scheme. The second contribution, which is based on the Manhattan-world assumption, extracts a dense disparity representation using the graph cut algorithm. Experimental results in outdoor scenarios are provided showing the validity of the proposed framework.
Keywords: Multispectral stereo rig; Dense disparity maps from multispectral stereo; Color and infrared images
|
|
|
Albert Clapes, Miguel Reyes, & Sergio Escalera. (2013). Multi-modal User Identification and Object Recognition Surveillance System. PRL - Pattern Recognition Letters, 34(7), 799–808.
Abstract: We propose an automatic surveillance system for user identification and object recognition based on multi-modal RGB-Depth data analysis. We model a RGBD environment learning a pixel-based background Gaussian distribution. Then, user and object candidate regions are detected and recognized using robust statistical approaches. The system robustly recognizes users and updates the system in an online way, identifying and detecting new actors in the scene. Moreover, segmented objects are described, matched, recognized, and updated online using view-point 3D descriptions, being robust to partial occlusions and local 3D viewpoint rotations. Finally, the system saves the historic of user–object assignments, being specially useful for surveillance scenarios. The system has been evaluated on a novel data set containing different indoor/outdoor scenarios, objects, and users, showing accurate recognition and better performance than standard state-of-the-art approaches.
Keywords: Multi-modal RGB-Depth data analysis; User identification; Object recognition; Intelligent surveillance; Visual features; Statistical learning
|
|
|
Josep Llados, Horst Bunke, & Enric Marti. (1997). Finding rotational symmetries by cyclic string matching. PRL - Pattern recognition letters, 18(14), 1435–1442.
Abstract: Symmetry is an important shape feature. In this paper, a simple and fast method to detect perfect and distorted rotational symmetries of 2D objects is described. The boundary of a shape is polygonally approximated and represented as a string. Rotational symmetries are found by cyclic string matching between two identical copies of the shape string. The set of minimum cost edit sequences that transform the shape string to a cyclically shifted version of itself define the rotational symmetry and its order. Finally, a modification of the algorithm is proposed to detect reflectional symmetries. Some experimental results are presented to show the reliability of the proposed algorithm
Keywords: Rotational symmetry; Reflectional symmetry; String matching
|
|
|
Pau Riba, Josep Llados, Alicia Fornes, & Anjan Dutta. (2017). Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. PRL - Pattern Recognition Letters, 87, 203–211.
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
|
|
|
Mikkel Thogersen, Sergio Escalera, Jordi Gonzalez, & Thomas B. Moeslund. (2016). Segmentation of RGB-D Indoor scenes by Stacking Random Forests and Conditional Random Fields. PRL - Pattern Recognition Letters, 80, 208–215.
Abstract: This paper proposes a technique for RGB-D scene segmentation using Multi-class
Multi-scale Stacked Sequential Learning (MMSSL) paradigm. Following recent trends in state-of-the-art, a base classifier uses an initial SLIC segmentation to obtain superpixels which provide a diminution of data while retaining object boundaries. A series of color and depth features are extracted from the superpixels, and are used in a Conditional Random Field (CRF) to predict superpixel labels. Furthermore, a Random Forest (RF) classifier using random offset features is also used as an input to the CRF, acting as an initial prediction. As a stacked classifier, another Random Forest is used acting on a spatial multi-scale decomposition of the CRF confidence map to correct the erroneous labels assigned by the previous classifier. The model is tested on the popular NYU-v2 dataset.
The approach shows that simple multi-modal features with the power of the MMSSL
paradigm can achieve better performance than state of the art results on the same dataset.
|
|
|
Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, & Andrew Bagdanov. (2019). Fast: Facilitated and accurate scene text proposals through fcn guided pruning. PRL - Pattern Recognition Letters, 119, 112–120.
Abstract: Class-specific text proposal algorithms can efficiently reduce the search space for possible text object locations in an image. In this paper we combine the Text Proposals algorithm with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same recall level and thus gaining a significant speed up. Our experiments demonstrate that such text proposal approaches yield significantly higher recall rates than state-of-the-art text localization techniques, while also producing better-quality localizations. Our results on the ICDAR 2015 Robust Reading Competition (Challenge 4) and the COCO-text datasets show that, when combined with strong word classifiers, this recall margin leads to state-of-the-art results in end-to-end scene text recognition.
|
|
|
Thanh Nam Le, Muhammad Muzzamil Luqman, Anjan Dutta, Pierre Heroux, Christophe Rigaud, Clement Guerin, et al. (2018). Subgraph spotting in graph representations of comic book images. PRL - Pattern Recognition Letters, 112, 118–124.
Abstract: Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset.
Keywords: Attributed graph; Region adjacency graph; Graph matching; Graph isomorphism; Subgraph isomorphism; Subgraph spotting; Graph indexing; Graph retrieval; Query by example; Dataset and comic book images
|
|
|
Pau Riba, Josep Llados, & Alicia Fornes. (2020). Hierarchical graphs for coarse-to-fine error tolerant matching. PRL - Pattern Recognition Letters, 134, 116–124.
Abstract: During the last years, graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their ability to capture both structural and appearance-based information. Thus, they provide a greater representational power than classical statistical frameworks. However, graph-based representations leads to high computational complexities usually dealt by graph embeddings or approximated matching techniques. Despite their representational power, they are very sensitive to noise and small variations of the input image. With the aim to cope with the time complexity and the variability present in the generated graphs, in this paper we propose to construct a novel hierarchical graph representation. Graph clustering techniques adapted from social media analysis have been used in order to contract a graph at different abstraction levels while keeping information about the topology. Abstract nodes attributes summarise information about the contracted graph partition. For the proposed representations, a coarse-to-fine matching technique is defined. Hence, small graphs are used as a filtering before more accurate matching methods are applied. This approach has been validated in real scenarios such as classification of colour images or retrieval of handwritten words (i.e. word spotting).
Keywords: Hierarchical graph representation; Coarse-to-fine graph matching; Graph-based retrieval
|
|
|
Arnau Baro, Pau Riba, Jorge Calvo-Zaragoza, & Alicia Fornes. (2019). From Optical Music Recognition to Handwritten Music Recognition: a Baseline. PRL - Pattern Recognition Letters, 123, 1–8.
Abstract: Optical Music Recognition (OMR) is the branch of document image analysis that aims to convert images of musical scores into a computer-readable format. Despite decades of research, the recognition of handwritten music scores, concretely the Western notation, is still an open problem, and the few existing works only focus on a specific stage of OMR. In this work, we propose a full Handwritten Music Recognition (HMR) system based on Convolutional Recurrent Neural Networks, data augmentation and transfer learning, that can serve as a baseline for the research community.
|
|