|
German Ros, J. Guerrero, Angel Sappa, & Antonio Lopez. (2013). VSLAM pose initialization via Lie groups and Lie algebras optimization. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 5740–5747).
Abstract: We present a novel technique for estimating initial 3D poses in the context of localization and Visual SLAM problems. The presented approach can deal with noise, outliers and a large amount of input data and still performs in real time in a standard CPU. Our method produces solutions with an accuracy comparable to those produced by RANSAC but can be much faster when the percentage of outliers is high or for large amounts of input data. On the current work we propose to formulate the pose estimation as an optimization problem on Lie groups, considering their manifold structure as well as their associated Lie algebras. This allows us to perform a fast and simple optimization at the same time that conserve all the constraints imposed by the Lie group SE(3). Additionally, we present several key design concepts related with the cost function and its Jacobian; aspects that are critical for the good performance of the algorithm.
Keywords: SLAM
|
|
|
Carlos Boned Riera, & Oriol Ramos Terrades. (2022). Discriminative Neural Variational Model for Unbalanced Classification Tasks in Knowledge Graph. In 26th International Conference on Pattern Recognition (pp. 2186–2191).
Abstract: Nowadays the paradigm of link discovery problems has shown significant improvements on Knowledge Graphs. However, method performances are harmed by the unbalanced nature of this classification problem, since many methods are easily biased to not find proper links. In this paper we present a discriminative neural variational auto-encoder model, called DNVAE from now on, in which we have introduced latent variables to serve as embedding vectors. As a result, the learnt generative model approximate better the underlying distribution and, at the same time, it better differentiate the type of relations in the knowledge graph. We have evaluated this approach on benchmark knowledge graph and Census records. Results in this last data set are quite impressive since we reach the highest possible score in the evaluation metrics. However, further experiments are still needed to deeper evaluate the performance of the method in more challenging tasks.
Keywords: Measurement; Couplings; Semantics; Ear; Benchmark testing; Data models; Pattern recognition
|
|
|
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornes, Josep Llados, et al. (2022). DocEnTr: An End-to-End Document Image Enhancement Transformer. In 26th International Conference on Pattern Recognition (pp. 1699–1705).
Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Keywords: Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
|
|
|
Mohamed Ali Souibgui, Alicia Fornes, Y.Kessentini, & C.Tudor. (2021). A Few-shot Learning Approach for Historical Encoded Manuscript Recognition. In 25th International Conference on Pattern Recognition (pp. 5413–5420).
Abstract: Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.
|
|
|
Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Weijer, Antonio Lopez, & Andrew Bagdanov. (2018). Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In 24th International Conference on Pattern Recognition (pp. 2262–2268).
Abstract: In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of
a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and
Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to the state-of-the-art in lifelong learning without forgetting.
|
|
|
Sounak Dey, Anjan Dutta, Suman Ghosh, Ernest Valveny, Josep Llados, & Umapada Pal. (2018). Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch. In 24th International Conference on Pattern Recognition (pp. 916–921).
Abstract: In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
|
|
|
Pau Riba, Andreas Fischer, Josep Llados, & Alicia Fornes. (2018). Learning Graph Distances with Message Passing Neural Networks. In 24th International Conference on Pattern Recognition (pp. 2239–2244).
Abstract: Graph representations have been widely used in pattern recognition thanks to their powerful representation formalism and rich theoretical background. A number of error-tolerant graph matching algorithms such as graph edit distance have been proposed for computing a distance between two labelled graphs. However, they typically suffer from a high
computational complexity, which makes it difficult to apply
these matching algorithms in a real scenario. In this paper, we propose an efficient graph distance based on the emerging field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure and learns a metric with a siamese network approach. The performance of the proposed graph distance is validated in two application cases, graph classification and graph retrieval of handwritten words, and shows a promising performance when compared with
(approximate) graph edit distance benchmarks.
Keywords: ★Best Paper Award★
|
|
|
Gemma Rotger, Felipe Lumbreras, Francesc Moreno-Noguer, & Antonio Agudo. (2018). 2D-to-3D Facial Expression Transfer. In 24th International Conference on Pattern Recognition (pp. 2008–2013).
Abstract: Automatically changing the expression and physical features of a face from an input image is a topic that has been traditionally tackled in a 2D domain. In this paper, we bring this problem to 3D and propose a framework that given an
input RGB video of a human face under a neutral expression, initially computes his/her 3D shape and then performs a transfer to a new and potentially non-observed expression. For this purpose, we parameterize the rest shape –obtained from standard factorization approaches over the input video– using a triangular
mesh which is further clustered into larger macro-segments. The expression transfer problem is then posed as a direct mapping between this shape and a source shape, such as the blend shapes of an off-the-shelf 3D dataset of human facial expressions. The mapping is resolved to be geometrically consistent between 3D models by requiring points in specific regions to map on semantic
equivalent regions. We validate the approach on several synthetic and real examples of input faces that largely differ from the source shapes, yielding very realistic expression transfers even in cases with topology changes, such as a synthetic video sequence of a single-eyed cyclops.
|
|
|
Lu Yu, Yongmei Cheng, & Joost Van de Weijer. (2018). Weakly Supervised Domain-Specific Color Naming Based on Attention. In 24th International Conference on Pattern Recognition (pp. 3019–3024).
Abstract: The majority of existing color naming methods focuses on the eleven basic color terms of the English language. However, in many applications, different sets of color names are used for the accurate description of objects. Labeling data to learn these domain-specific color names is an expensive and laborious task. Therefore, in this article we aim to learn color names from weakly labeled data. For this purpose, we add an attention branch to the color naming network. The attention branch is used to modulate the pixel-wise color naming predictions of the network. In experiments, we illustrate that the attention branch correctly identifies the relevant regions. Furthermore, we show that our method obtains state-of-the-art results for pixel-wise and image-wise classification on the EBAY dataset and is able to learn color names for various domains.
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2014). Cost-sensitive Structured SVM for Multi-category Domain Adaptation. In 22nd International Conference on Pattern Recognition (pp. 3886–3891). IEEE.
Abstract: Domain adaptation addresses the problem of accuracy drop that a classifier may suffer when the training data (source domain) and the testing data (target domain) are drawn from different distributions. In this work, we focus on domain adaptation for structured SVM (SSVM). We propose a cost-sensitive domain adaptation method for SSVM, namely COSS-SSVM. In particular, during the re-training of an adapted classifier based on target and source data, the idea that we explore consists in introducing a non-zero cost even for correctly classified source domain samples. Eventually, we aim to learn a more targetoriented classifier by not rewarding (zero loss) properly classified source-domain training samples. We assess the effectiveness of COSS-SSVM on multi-category object recognition.
Keywords: Domain Adaptation; Pedestrian Detection
|
|
|
Francisco Cruz, & Oriol Ramos Terrades. (2014). EM-Based Layout Analysis Method for Structured Documents. In 22nd International Conference on Pattern Recognition (pp. 315–320).
Abstract: In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according
to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). MSER-based Real-Time Text Detection and Tracking. In 22nd International Conference on Pattern Recognition (pp. 3110–3115).
Abstract: We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization.
A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance. In 22nd International Conference on Pattern Recognition (pp. 3074–3079).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task, especially historical ones. In the paper, we propose a coarse-to-fine handwritten word spotting approach based on graph representation. The presented model comprises both the topological and morphological signatures of the handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. Aiming at developing a practical and efficient word spotting approach for large-scale historical handwritten documents, a fast and coarse comparison is first applied to prune the regions that are not similar to the query based on the graph embedding methodology. Afterwards, the query and regions of interest are compared by graph edit distance based on the Dynamic Time Warping alignment. The proposed approach is evaluated on a public dataset containing 50 pages of historical marriage license records. The results show that the proposed approach achieves a compromise between efficiency and accuracy.
Keywords: word spotting; coarse-to-fine mechamism; graphbased representation; graph embedding; graph edit distance
|
|
|
David Fernandez, Jon Almazan, Nuria Cirera, Alicia Fornes, & Josep Llados. (2014). BH2M: the Barcelona Historical Handwritten Marriages database. In 22nd International Conference on Pattern Recognition (pp. 256–261).
Abstract: This paper presents an image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms. The contribution of this paper is twofold. First, it presents a complete ground truth which covers the whole pipeline of handwriting
recognition research, from layout analysis to recognition and understanding. Second, it is the first dataset in the emerging area of genealogical document analysis, where documents are manuscripts pseudo-structured with specific lexicons and the interest is beyond pure transcriptions but context dependent.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2014). Embedding Document Structure to Bag-of-Words through Pair-wise Stable Key-regions. In 22nd International Conference on Pattern Recognition (pp. 2903–2908).
Abstract: Since the document structure carries valuable discriminative information, plenty of efforts have been made for extracting and understanding document structure among which layout analysis approaches are the most commonly used. In this paper, Distance Transform based MSER (DTMSER) is employed to efficiently extract the document structure as a dendrogram of key-regions which roughly correspond to structural elements such as characters, words and paragraphs. Inspired by the Bag
of Words (BoW) framework, we propose an efficient method for structural document matching by representing the document image as a histogram of key-region pairs encoding structural relationships.
Applied to the scenario of document image retrieval, experimental results demonstrate a remarkable improvement when comparing the proposed method with typical BoW and pyramidal BoW methods.
|
|