Records |
Links |
Author |
Manuel Carbonell; Alicia Fornes; Mauricio Villegas; Josep Llados |
Title |
A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages |
Type |
Journal Article |
Year |
2020 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
Volume |
136 |
Issue |
Pages |
219-227 |
Keywords |
Abstract |
In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.140; 601.311; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ CFV2020 |
Serial |
3451 |
Permanent link to this record |
Author |
Manuel Carbonell; Joan Mas; Mauricio Villegas; Alicia Fornes; Josep Llados |
Title |
End-to-End Handwritten Text Detection and Transcription in Full Pages |
Type |
Conference Article |
Year |
2019 |
Publication |
2nd International Workshop on Machine Learning |
Abbreviated Journal |
Volume |
5 |
Issue |
Pages |
29-34 |
Keywords |
Handwritten Text Recognition; Layout Analysis; Text segmentation; Deep Neural Networks; Multi-task learning |
Abstract |
When transcribing handwritten document images, inaccuracies in the text segmentation step often cause errors in the subsequent transcription step. For this reason, some recent methods propose to perform the recognition at paragraph level. But still, errors in the segmentation of paragraphs can affect
the transcription performance. In this work, we propose an end-to-end framework to transcribe full pages. The joint text detection and transcription allows to remove the layout analysis requirement at test time. The experimental results show that our approach can achieve comparable results to models that assume
segmented paragraphs, and suggest that joining the two tasks brings an improvement over doing the two tasks separately. |
Address |
Sydney; Australia; September 2019 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.140; 601.311; 600.140 |
Approved |
no |
Call Number |
Admin @ si @ CMV2019 |
Serial |
3353 |
Permanent link to this record |
Author |
Manuel Carbonell; Mauricio Villegas; Alicia Fornes; Josep Llados |
Title |
Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model |
Type |
Conference Article |
Year |
2018 |
Publication |
13th IAPR International Workshop on Document Analysis Systems |
Abbreviated Journal |
Volume |
Issue |
Pages |
399-404 |
Keywords |
Named entity recognition; Handwritten Text Recognition; neural networks |
Abstract |
When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing. |
Address |
Vienna; Austria; April 2018 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.097; 603.057; 601.311; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ CVF2018 |
Serial |
3170 |
Permanent link to this record |
Author |
Manuel Carbonell; Pau Riba; Mauricio Villegas; Alicia Fornes; Josep Llados |
Title |
Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents |
Type |
Conference Article |
Year |
2020 |
Publication |
25th International Conference on Pattern Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
The use of administrative documents to communicate and leave record of business information requires of methods
able to automatically extract and understand the content from
such documents in a robust and efficient way. In addition,
the semi-structured nature of these reports is specially suited
for the use of graph-based representations which are flexible
enough to adapt to the deformations from the different document
templates. Moreover, Graph Neural Networks provide the proper
methodology to learn relations among the data elements in
these documents. In this work we study the use of Graph
Neural Network architectures to tackle the problem of entity
recognition and relation extraction in semi-structured documents.
Our approach achieves state of the art results in the three
tasks involved in the process. Additionally, the experimentation
with two datasets of different nature demonstrates the good
generalization ability of our approach. |
Address |
Virtual; January 2021 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ CRV2020 |
Serial |
3509 |
Permanent link to this record |
Author |
Marçal Rusiñol |
Title |
A Model of Vectorial Signatures in Terms of Expressive Sub-Shapes: Symbol Indexation in Technical Documents |
Type |
Report |
Year |
2006 |
Publication |
CVC Technical Report #94 |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
DAG @ dag @ Rus2006 |
Serial |
668 |
Permanent link to this record |
Author |
Marçal Rusiñol |
Title |
Geometric and Structural-based Symbol Spotting. Application to Focused Retrieval in Graphic Document Collections |
Type |
Book Whole |
Year |
2009 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Usually, pattern recognition systems consist of two main parts. On the one hand, the data acquisition and, on the other hand, the classification of this data on a certain category. In order to recognize which category a certain query element belongs to, a set of pattern models must be provided beforehand. An off-line learning stage is needed to train the classifier and to offer a robust classification of the patterns. Within the pattern recognition field, we are interested in the recognition of graphics and, in particular, on the analysis of documents rich in graphical information. In this context, one of the main concerns is to see if the proposed systems remain scalable with respect to the data volume so as it can handle growing amounts of symbol models. In order to avoid to work with a database of reference symbols, symbol spotting and on-the-fly symbol recognition methods have been introduced in the past years.
Generally speaking, the symbol spotting problem can be defined as the identification of a set of regions of interest from a document image which are likely to contain an instance of a certain queriedn symbol without explicitly applying the whole pattern recognition scheme. Our application framework consists on indexing a collection of graphic-rich document images. This collection is
queried by example with a single instance of the symbol to look for and, by means of symbol spotting methods we retrieve the regions of interest where the symbol is likely to appear within the documents. This kind of applications are known as focused retrieval methods.
In order that the focused retrieval application can handle large collections of documents there is a need to provide an efficient access to the large volume of information that might be stored. We use indexing strategies in order to efficiently retrieve by similarity the locations where a certain part of the symbol appears. In that scenario, graphical patterns should be used as indices for accessing and navigating the collection of documents.
These indexing mechanism allow the user to search for similar elements using graphical information rather than textual queries.
Along this thesis we present a spotting architecture and different methods aiming to build a complete focused retrieval application dealing with a graphic-rich document collections. In addition, a protocol to evaluate the performance of symbol
spotting systems in terms of recognition abilities, location accuracy and scalability is proposed. |
Address |
Barcelona (Spain) |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Ediciones Graficas Rey |
Place of Publication |
Editor |
Josep Llados |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
DAG @ dag @ Rus2009 |
Serial |
1264 |
Permanent link to this record |
Author |
Marçal Rusiñol |
Title |
Classificació semàntica i visual de documents digitals |
Type |
Journal |
Year |
2019 |
Publication |
Revista de biblioteconomia i documentacio |
Abbreviated Journal |
Volume |
Issue |
Pages |
75-86 |
Keywords |
Abstract |
Se analizan los sistemas de procesamiento automático que trabajan sobre documentos digitalizados con el objetivo de describir los contenidos. De esta forma contribuyen a facilitar el acceso, permitir la indización automática y hacer accesibles los documentos a los motores de búsqueda. El objetivo de estas tecnologías es poder entrenar modelos computacionales que sean capaces de clasificar, agrupar o realizar búsquedas sobre documentos digitales. Así, se describen las tareas de clasificación, agrupamiento y búsqueda. Cuando utilizamos tecnologías de inteligencia artificial en los sistemas de
clasificación esperamos que la herramienta nos devuelva etiquetas semánticas; en sistemas de agrupamiento que nos devuelva documentos agrupados en clusters significativos; y en sistemas de búsqueda esperamos que dada una consulta, nos devuelva una lista ordenada de documentos en función de la relevancia. A continuación se da una visión de conjunto de los métodos que nos permiten describir los documentos digitales, tanto de manera visual (cuál es su apariencia), como a partir de sus contenidos semánticos (de qué hablan). En cuanto a la descripción visual de documentos se aborda el estado de la cuestión de las representaciones numéricas de documentos digitalizados
tanto por métodos clásicos como por métodos basados en el aprendizaje profundo (deep learning). Respecto de la descripción semántica de los contenidos se analizan técnicas como el reconocimiento óptico de caracteres (OCR); el cálculo de estadísticas básicas sobre la aparición de las diferentes palabras en un texto (bag-of-words model); y los métodos basados en aprendizaje profundo como el método word2vec, basado en una red neuronal que, dadas unas cuantas palabras de un texto, debe predecir cuál será la
siguiente palabra. Desde el campo de las ingenierías se están transfiriendo conocimientos que se han integrado en productos o servicios en los ámbitos de la archivística, la biblioteconomía, la documentación y las plataformas de gran consumo, sin embargo los algoritmos deben ser lo suficientemente eficientes no sólo para el reconocimiento y transcripción literal sino también para la capacidad de interpretación de los contenidos. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.084; 600.135; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ Rus2019 |
Serial |
3282 |
Permanent link to this record |
Author |
Marçal Rusiñol; Agnes Borras; Josep Llados |
Title |
Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images |
Type |
Journal Article |
Year |
2010 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
Volume |
31 |
Issue |
3 |
Pages |
188–201 |
Keywords |
Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings |
Abstract |
This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results. |
Address |
Corporate Author |
Thesis |
Publisher |
Elsevier |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
DAG @ dag @ RBL2010 |
Serial |
1177 |
Permanent link to this record |
Author |
Marçal Rusiñol; David Aldavert; Dimosthenis Karatzas; Ricardo Toledo; Josep Llados |
Title |
Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content. Advances in Information Retrieval |
Type |
Conference Article |
Year |
2011 |
Publication |
33rd European Conference on Information Retrieval |
Abbreviated Journal |
Volume |
6611 |
Issue |
Pages |
314-325 |
Keywords |
Abstract |
In this paper we propose an efficient queried-by-example retrieval system which is able to retrieve trademark images by similarity from patent and trademark offices' digital libraries. Logo images are described by both their semantic content, by means of the Vienna codes, and their visual contents, by using shape and color as visual cues. The trademark descriptors are then indexed by a locality-sensitive hashing data structure aiming to perform approximate k-NN search in high dimensional spaces in sub-linear time. The resulting ranked lists are combined by using the Condorcet method and a relevance feedback step helps to iteratively revise the query and refine the obtained results. The experiments demonstrate the effectiveness and efficiency of this system on a realistic and large dataset. |
Address |
Dublin, Ireland |
Corporate Author |
Thesis |
Publisher |
Springer |
Place of Publication |
Berlin |
Editor |
P. Clough; C. Foley; C. Gurrin; G.J.F. Jones; W. Kraaij; H. Lee; V. Murdoch |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-3-642-20160-8 |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ RAK2011 |
Serial |
1737 |
Permanent link to this record |
Author |
Marçal Rusiñol; David Aldavert; Ricardo Toledo; Josep Llados |
Title |
Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method |
Type |
Conference Article |
Year |
2011 |
Publication |
11th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
63-67 |
Keywords |
Abstract |
In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts. |
Address |
Beijing, China |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ RAT2011 |
Serial |
1788 |
Permanent link to this record |