Miguel Angel Bautista, Sergio Escalera, Xavier Baro, Petia Radeva, Jordi Vitria, & Oriol Pujol. (2011). Minimal Design of Error-Correcting Output Codes. PRL - Pattern Recognition Letters, 33(6), 693–702.
Abstract: IF JCR CCIA 1.303 2009 54/103
The classification of large number of object categories is a challenging trend in the pattern recognition field. In literature, this is often addressed using an ensemble of classifiers. In this scope, the Error-correcting output codes framework has demonstrated to be a powerful tool for combining classifiers. However, most state-of-the-art ECOC approaches use a linear or exponential number of classifiers, making the discrimination of a large number of classes unfeasible. In this paper, we explore and propose a minimal design of ECOC in terms of the number of classifiers. Evolutionary computation is used for tuning the parameters of the classifiers and looking for the best minimal ECOC code configuration. The results over several public UCI datasets and different multi-class computer vision problems show that the proposed methodology obtains comparable (even better) results than state-of-the-art ECOC methodologies with far less number of dichotomizers.
Keywords: Multi-class classification; Error-correcting output codes; Ensemble of classifiers
|
Sergio Escalera, David Masip, Eloi Puertas, Petia Radeva, & Oriol Pujol. (2011). Online Error-Correcting Output Codes. PRL - Pattern Recognition Letters, 32(3), 458–467.
Abstract: IF JCR CCIA 1.303 2009 54/103
This article proposes a general extension of the error correcting output codes framework to the online learning scenario. As a result, the final classifier handles the addition of new classes independently of the base classifier used. In particular, this extension supports the use of both online example incremental and batch classifiers as base learners. The extension of the traditional problem independent codings one-versus-all and one-versus-one is introduced. Furthermore, two new codings are proposed, unbalanced online ECOC and a problem dependent online ECOC. This last online coding technique takes advantage of the problem data for minimizing the number of dichotomizers used in the ECOC framework while preserving a high accuracy. These techniques are validated on an online setting of 11 data sets from UCI database and applied to two real machine vision applications: traffic sign recognition and face recognition. As a result, the online ECOC techniques proposed provide a feasible and robust way for handling new classes using any base classifier.
|
Carles Fernandez, Pau Baiget, Xavier Roca, & Jordi Gonzalez. (2011). Augmenting Video Surveillance Footage with Virtual Agents for Incremental Event Evaluation. PRL - Pattern Recognition Letters, 32(6), 878–889.
Abstract: The fields of segmentation, tracking and behavior analysis demand for challenging video resources to test, in a scalable manner, complex scenarios like crowded environments or scenes with high semantics. Nevertheless, existing public databases cannot scale the presence of appearing agents, which would be useful to study long-term occlusions and crowds. Moreover, creating these resources is expensive and often too particularized to specific needs. We propose an augmented reality framework to increase the complexity of image sequences in terms of occlusions and crowds, in a scalable and controllable manner. Existing datasets can be increased with augmented sequences containing virtual agents. Such sequences are automatically annotated, thus facilitating evaluation in terms of segmentation, tracking, and behavior recognition. In order to easily specify the desired contents, we propose a natural language interface to convert input sentences into virtual agent behaviors. Experimental tests and validation in indoor, street, and soccer environments are provided to show the feasibility of the proposed approach in terms of robustness, scalability, and semantics.
|
Marçal Rusiñol, Agnes Borras, & Josep Llados. (2010). Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images. PRL - Pattern Recognition Letters, 31(3), 188–201.
Abstract: This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results.
Keywords: Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings
|
Miquel Ferrer, Ernest Valveny, & F. Serratosa. (2009). Median graph: A new exact algorithm using a distance based on the maximum common subgraph. PRL - Pattern Recognition Letters, 30(5), 579–588.
Abstract: Median graphs have been presented as a useful tool for capturing the essential information of a set of graphs. Nevertheless, computation of optimal solutions is a very hard problem. In this work we present a new and more efficient optimal algorithm for the median graph computation. With the use of a particular cost function that permits the definition of the graph edit distance in terms of the maximum common subgraph, and a prediction function in the backtracking algorithm, we reduce the size of the search space, avoiding the evaluation of a great amount of states and still obtaining the exact median. We present a set of experiments comparing our new algorithm against the previous existing exact algorithm using synthetic data. In addition, we present the first application of the exact median graph computation to real data and we compare the results against an approximate algorithm based on genetic search. These experimental results show that our algorithm outperforms the previous existing exact algorithm and in addition show the potential applicability of the exact solutions to real problems.
|
Fadi Dornaika, & Angel Sappa. (2009). Instantaneous 3D motion from image derivatives using the Least Trimmed Square Regression. PRL - Pattern Recognition Letters, 30(5), 535–543.
Abstract: This paper presents a new technique to the instantaneous 3D motion estimation. The main contributions are as follows. First, we show that the 3D camera or scene velocity can be retrieved from image derivatives only assuming that the scene contains a dominant plane. Second, we propose a new robust algorithm that simultaneously provides the Least Trimmed Square solution and the percentage of inliers-the non-contaminated data. Experiments on both synthetic and real image sequences demonstrated the effectiveness of the developed method. Those experiments show that the new robust approach can outperform classical robust schemes.
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2009). Separability of Ternary Codes for Sparse Designs of Error-Correcting Output Codes. PRL - Pattern Recognition Letters, 30(3), 285–297.
Abstract: Error Correcting Output Codes (ECOC) represent a successful framework to deal with multi-class categorization problems based on combining binary classifiers. In this paper, we present a new formulation of the ternary ECOC distance and the error-correcting capabilities in the ternary ECOC framework. Based on the new measure, we stress on how to design coding matrices preventing codification ambiguity and propose a new Sparse Random coding matrix with ternary distance maximization. The results on the UCI Repository and in a real speed traffic categorization problem show that when the coding design satisfies the new ternary measures, significant performance improvement is obtained independently of the decoding strategy applied.
|
Fadi Dornaika, & Angel Sappa. (2007). Rigid and Non-rigid Face Motion Tracking by Aligning Texture Maps and Stereo 3D Models. PRL - Pattern Recognition Letters, 28(15), 2116–2126.
|
Xavier Otazu, & Oriol Pujol. (2006). Wavelet based approach to cluster analysis. Application on low dimensional data sets. PRL - Pattern Recognition Letters, 27(14), 1590–1605.
|
Debora Gil, & Petia Radeva. (2006). Inhibition of false landmarks. PRL - Pattern Recognition Letters, 27(9), 1022–1030.
Abstract: Corners and junctions are landmarks characterized by the lack of differentiability in the unit tangent to the image level curve. Detectors based on differential operators are not, by their own definition, the best posed as they require a higher degree of differentiability to yield a reliable response. We argue that a corner detector should be based on the degree of continuity of the tangent vector to the image level sets, work on the image domain and need no assumptions on neither the image local structure nor the particular geometry of the corner/junction. An operator measuring the degree of differentiability of the projection matrix on the image gradient fulfills the above requirements. Because using smoothing kernels leads to corner misplacement, we suggest an alternative fake response remover based on the receptive field inhibition of spurious details. The combination of both orientation discontinuity detection and noise inhibition produce our inhibition orientation energy (IOE) landmark locator.
|
Oriol Ramos Terrades, & Ernest Valveny. (2006). A new use of the ridgelets transform for describing linear singularities in images. PRL - Pattern Recognition Letters, 27(6), 587–596.
|
Jaume Amores, N. Sebe, & Petia Radeva. (2006). Boosting the distance estimation: Application to the K-Nearest Neighbor Classifier. PRL - Pattern Recognition Letters, 27(3), 201–209.
|
Cristina Cañero, & Petia Radeva. (2003). Vesselness enhancement diffusion. PRL - Pattern Recognition Letters, 24(16), 3141–3151.
|
Ruben Tito, Dimosthenis Karatzas, & Ernest Valveny. (2023). Hierarchical multimodal transformers for Multi-Page DocVQA. PR - Pattern Recognition, 144, 109834.
Abstract: Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
|
Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, & Oriol Ramos Terrades. (2023). VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification. PR - Pattern Recognition, 139, 109419.
Abstract: Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
|