|
Records |
Links |
|
Author |
Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas |
![download file file](http://refbase.cvc.uab.es/img/file.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition |
Type |
Journal Article |
|
Year |
2022 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
129 |
Issue |
|
Pages |
108766 |
|
|
Keywords |
|
|
|
Abstract |
The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Sept. 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121; 600.162 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KRR2022 |
Serial |
3556 |
|
Permanent link to this record |
|
|
|
|
Author |
Fernando Vilariño |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Unveiling the Social Impact of AI |
Type |
Conference Article |
|
Year |
2020 |
Publication |
Workshop at Digital Living Lab Days Conference |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
September 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MV; DAG; 600.121; 600.140;SIAI |
Approved |
no |
|
|
Call Number |
Admin @ si @ Vil2020 |
Serial |
3459 |
|
Permanent link to this record |
|
|
|
|
Author |
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
Scene Text Visual Question Answering |
Type |
Conference Article |
|
Year |
2019 |
Publication |
18th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
4291-4301 |
|
|
Keywords |
|
|
|
Abstract |
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Seul; Corea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCV |
|
|
Notes |
DAG; 600.129; 600.135; 601.338; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BTM2019b |
Serial |
3285 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Ernest Valveny |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
Compact and Efficient Multitask Learning in Vision, Language and Speech |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2933-2942 |
|
|
Keywords |
|
|
|
Abstract |
Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Seul; Korea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCVW |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RaV2019 |
Serial |
3365 |
|
Permanent link to this record |
|
|
|
|
Author |
Arnau Baro; Pau Riba; Alicia Fornes |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Towards the recognition of compound music notes in handwritten music scores |
Type |
Conference Article |
|
Year |
2016 |
Publication |
15th international conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Shenzhen; China; October 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2167-6445 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICFHR |
|
|
Notes |
DAG; 600.097 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BRF2016 |
Serial |
2903 |
|
Permanent link to this record |
|
|
|
|
Author |
Veronica Romero; Alicia Fornes; Enrique Vidal; Joan Andreu Sanchez |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
|
|
Title |
Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books |
Type |
Conference Article |
|
Year |
2016 |
Publication |
15th international conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and
genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous
works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Shenzhen; China; October 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICFHR |
|
|
Notes |
DAG; 600.097; 602.006 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RFV2016 |
Serial |
2909 |
|
Permanent link to this record |
|
|
|
|
Author |
Oriol Ramos Terrades; Salvatore Tabbone; L. Wendling; Ernest Valveny |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Symbol Recognition based on a Multiresolution Analysis of the Radon Transform |
Type |
Miscellaneous |
|
Year |
2004 |
Publication |
The International Workshop on Multidisciplinary Image, Video, and Audio Retrieval and Mining |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Sherbrooke (Canada) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RTW2004 |
Serial |
500 |
|
Permanent link to this record |
|
|
|
|
Author |
Albert Berenguel; Oriol Ramos Terrades; Josep Llados; Cristina Cañero |
![goto web page url](http://refbase.cvc.uab.es/img/www.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Recurrent Comparator with attention models to detect counterfeit documents |
Type |
Conference Article |
|
Year |
2019 |
Publication |
15th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This paper is focused on the detection of counterfeit documents via the recurrent comparison of the security textured background regions of two images. The main contributions are twofold: first we apply and adapt a recurrent comparator architecture with attention mechanism to the counterfeit detection task, which constructs a representation of the background regions by recurrently condition the next observation, learning the difference between genuine and counterfeit images through iterative glimpses. Second we propose a new counterfeit document dataset to ensure the generalization of the learned model towards the detection of the lack of resolution during the counterfeit manufacturing. The presented network, outperforms state-of-the-art classification approaches for counterfeit detection as demonstrated in the evaluation. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Sidney; Australia; September 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.140; 600.121; 601.269 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BRL2019 |
Serial |
3456 |
|
Permanent link to this record |
|
|
|
|
Author |
Partha Pratim Roy; Umapada Pal; Josep Llados |
![goto web page url](http://refbase.cvc.uab.es/img/www.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Seal Object Detection in Document Images using GHT of Local Component Shapes |
Type |
Conference Article |
|
Year |
2010 |
Publication |
10th ACM Symposium On Applied Computing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
23–27 |
|
|
Keywords |
|
|
|
Abstract |
Due to noise, overlapped text/signature and multi-oriented nature, seal (stamp) object detection involves a difficult challenge. This paper deals with automatic detection of seal from documents with cluttered background. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors (distance and angular position) computed from recognition result of individual connected components (characters). Recognition of multi-scale and multi-oriented component is done using Support Vector Machine classifier. Generalized Hough Transform (GHT) is used to detect the seal and a voting is casted for finding possible location of the seal object in a document based on these spatial feature descriptor of components pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal object in a document. Experimental results show that, the method is efficient to locate seal instance of arbitrary shape and orientation in documents. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Sierre, Switzerland |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SAC |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RPL2010a |
Serial |
1291 |
|
Permanent link to this record |
|
|
|
|
Author |
Muhammad Muzzamil Luqman; Thierry Brouard; Jean-Yves Ramel; Josep Llados |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Vers une approche foue of encapsulation de graphes: application a la reconnaissance de symboles |
Type |
Conference Article |
|
Year |
2010 |
Publication |
Colloque International Francophone sur l'Écrit et le Document |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
169-184 |
|
|
Keywords |
Fuzzy interval; Graph embedding; Bayesian network; Symbol recognition |
|
|
Abstract |
We present a new methodology for symbol recognition, by employing a structural approach for representing visual associations in symbols and a statistical classifier for recognition. A graphic symbol is vectorized, its topological and geometrical details are encoded by an attributed relational graph and a signature is computed for it. Data adapted fuzzy intervals have been introduced for addressing the sensitivity of structural representations to noise. The joint probability distribution of signatures is encoded by a Bayesian network, which serves as a mechanism for pruning irrelevant features and choosing a subset of interesting features from structural signatures of underlying symbol set, and is deployed in a supervised learning scenario for recognizing query symbols. Experimental results on pre-segmented 2D linear architectural and electronic symbols from GREC databases are presented. |
|
|
Address ![sorted by Address field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Sousse, Tunisia |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIFED |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LBR2010a |
Serial |
1293 |
|
Permanent link to this record |