|   | 
Details
   web
Records
Author Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title Real-time Lexicon-free Scene Text Retrieval Type Journal Article
Year 2021 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 110 Issue Pages 107656
Keywords
Abstract In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.129; 601.338 Approved no
Call Number Admin @ si @ MTD2021 Serial 3493
Permanent link to this record
 

 
Author Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas
Title Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition Type Journal Article
Year 2022 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 129 Issue Pages 108766
Keywords
Abstract The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios.
Address Sept. 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.162 Approved no
Call Number Admin @ si @ KRR2022 Serial 3556
Permanent link to this record
 

 
Author Pau Riba; Andreas Fischer; Josep Llados; Alicia Fornes
Title Learning graph edit distance by graph neural networks Type Journal Article
Year 2021 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 120 Issue Pages 108132
Keywords
Abstract The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine the advances on deep metric learning with traditional approximations of the graph edit distance. Hence, we propose an efficient graph distance based on the novel field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure, and thus, leveraging this information for its use on a distance computation. The performance of the proposed graph distance is validated on two different scenarios. On the one hand, in a graph retrieval of handwritten words i.e. keyword spotting, showing its superior performance when compared with (approximate) graph edit distance benchmarks. On the other hand, demonstrating competitive results for graph similarity learning when compared with the current state-of-the-art on a recent benchmark dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ RFL2021 Serial 3611
Permanent link to this record
 

 
Author S.K. Jemni; Mohamed Ali Souibgui; Yousri Kessentini; Alicia Fornes
Title Enhance to Read Better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement Type Journal Article
Year 2022 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 123 Issue Pages 108370
Keywords
Abstract Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover the degraded documents into a and form. Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable. To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents. Extensive experiments conducted on degraded Arabic and Latin handwritten documents demonstrate the usefulness of integrating the recognizer within the GAN architecture, which improves both the visual quality and the readability of the degraded document images. Moreover, we outperform the state of the art in H-DIBCO challenges, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images, on this task.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.124; 600.121; 602.230 Approved no
Call Number Admin @ si @ JSK2022 Serial 3613
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 144 Issue Pages 109834
Keywords
Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.155; 600.121 Approved no
Call Number Admin @ si @ TKV2023 Serial 3825
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 139 Issue Pages 109419
Keywords
Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ BMC2023 Serial 3826
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 144 Issue 109834 Pages
Keywords
Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ TKV2023 Serial 3836
Permanent link to this record
 

 
Author Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez
Title Single image super-resolution based on directional variance attention network Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 133 Issue Pages 108997
Keywords
Abstract Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ BPF2023 Serial 3861
Permanent link to this record
 

 
Author Xavier Soria; Angel Sappa; Patricio Humanante; Arash Akbarinia
Title Dense extreme inception network for edge detection Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 139 Issue Pages 109461
Keywords
Abstract Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we address this limitation. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pre-trained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU Approved no
Call Number Admin @ si @ SSH2023 Serial 3982
Permanent link to this record
 

 
Author A. Pujol; Jordi Vitria; Felipe Lumbreras; Juan J. Villanueva
Title Topological principal component analysis for face encoding and recognition Type Journal Article
Year 2001 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 22 Issue 6-7 Pages 769–776
Keywords
Abstract IF: 0.552
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS;OR;MV Approved no
Call Number ADAS @ adas @ PVL2001 Serial 155
Permanent link to this record
 

 
Author Gemma Sanchez; Josep Llados; K. Tombre
Title A mean string algorithm to compute the average among a set of 2D shapes Type Journal Article
Year 2002 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 23 Issue 1-3 Pages 203–214
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; IF: 0.409 Approved no
Call Number DAG @ dag @ SLT2002 Serial 275
Permanent link to this record
 

 
Author A. Martinez; Jordi Vitria
Title Learning mixture models using a genetic version of the EM algorithm. Type Journal Article
Year 2000 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 21 Issue 8 Pages 759–769
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;MV Approved no
Call Number BCNPCL @ bcnpcl @ MVi2000 Serial 335
Permanent link to this record
 

 
Author M. Bressan; Jordi Vitria
Title Nonparametric Discriminant Analysis and Nearest Neighbor Classification Type Journal Article
Year 2003 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 24 Issue 15 Pages 2743–2749
Keywords
Abstract IF: 0.809
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;MV Approved no
Call Number BCNPCL @ bcnpcl @ BrV2003b Serial 367
Permanent link to this record
 

 
Author Cristina Cañero; Petia Radeva
Title Vesselness enhancement diffusion Type Journal Article
Year 2003 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 24 Issue 16 Pages 3141–3151
Keywords
Abstract IF: 0.809
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number BCNPCL @ bcnpcl @ CaR2003 Serial 371
Permanent link to this record
 

 
Author David Guillamet; Jordi Vitria
Title Evaluation of distance metrics for recognition based on non-negative matrix factorization Type Journal Article
Year 2003 Publication Pattern Recognition Letters Abbreviated Journal (up) PRL
Volume 24 Issue 9-10 Pages 1599 –1605
Keywords
Abstract IF: 0.809
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;MV Approved no
Call Number BCNPCL @ bcnpcl @ GuV2003b Serial 380
Permanent link to this record