Records |
Links |
Author |
Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol |

Title |
Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture |
Type |
Journal Article |
Year |
2021 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
Volume |
112 |
Issue |
Pages |
107790 |
Keywords |
Abstract  |
Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging
problem. The main challenge faced when training a language model is to
deal with the language model corpus which is usually different to the one
used for training the handwritten word recognition system. Thus, the bias
between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this
work, we introduce Candidate Fusion, a novel way to integrate an external
language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to
the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two
improvements. On the one hand, the sequence-to-sequence recognizer has
the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided
by the language model. On the other hand, the external language model
has the ability to adapt itself to the training corpus and even learn the
most commonly errors produced from the recognizer. Finally, by conducting
comprehensive experiments, the Candidate Fusion proves to outperform the
state-of-the-art language models for handwritten word recognition tasks. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.140; 601.302; 601.312; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ KRV2021 |
Serial |
3343 |
Permanent link to this record |
Author |
Marçal Rusiñol |

Title |
Classificació semàntica i visual de documents digitals |
Type |
Journal |
Year |
2019 |
Publication |
Revista de biblioteconomia i documentacio |
Abbreviated Journal |
Volume |
Issue |
Pages |
75-86 |
Keywords |
Abstract  |
Se analizan los sistemas de procesamiento automático que trabajan sobre documentos digitalizados con el objetivo de describir los contenidos. De esta forma contribuyen a facilitar el acceso, permitir la indización automática y hacer accesibles los documentos a los motores de búsqueda. El objetivo de estas tecnologías es poder entrenar modelos computacionales que sean capaces de clasificar, agrupar o realizar búsquedas sobre documentos digitales. Así, se describen las tareas de clasificación, agrupamiento y búsqueda. Cuando utilizamos tecnologías de inteligencia artificial en los sistemas de
clasificación esperamos que la herramienta nos devuelva etiquetas semánticas; en sistemas de agrupamiento que nos devuelva documentos agrupados en clusters significativos; y en sistemas de búsqueda esperamos que dada una consulta, nos devuelva una lista ordenada de documentos en función de la relevancia. A continuación se da una visión de conjunto de los métodos que nos permiten describir los documentos digitales, tanto de manera visual (cuál es su apariencia), como a partir de sus contenidos semánticos (de qué hablan). En cuanto a la descripción visual de documentos se aborda el estado de la cuestión de las representaciones numéricas de documentos digitalizados
tanto por métodos clásicos como por métodos basados en el aprendizaje profundo (deep learning). Respecto de la descripción semántica de los contenidos se analizan técnicas como el reconocimiento óptico de caracteres (OCR); el cálculo de estadísticas básicas sobre la aparición de las diferentes palabras en un texto (bag-of-words model); y los métodos basados en aprendizaje profundo como el método word2vec, basado en una red neuronal que, dadas unas cuantas palabras de un texto, debe predecir cuál será la
siguiente palabra. Desde el campo de las ingenierías se están transfiriendo conocimientos que se han integrado en productos o servicios en los ámbitos de la archivística, la biblioteconomía, la documentación y las plataformas de gran consumo, sin embargo los algoritmos deben ser lo suficientemente eficientes no sólo para el reconocimiento y transcripción literal sino también para la capacidad de interpretación de los contenidos. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.084; 600.135; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ Rus2019 |
Serial |
3282 |
Permanent link to this record |
Author |
Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu |

Title |
Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video |
Type |
Journal Article |
Year |
2018 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
Volume |
80 |
Issue |
Pages |
64-82 |
Keywords |
Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition |
Abstract  |
Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.097; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ RSJ2018 |
Serial |
3096 |
Permanent link to this record |
Author |
Jaume Gibert; Ernest Valveny; Horst Bunke |

Title |
Feature Selection on Node Statistics Based Embedding of Graphs |
Type |
Journal Article |
Year |
2012 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
Volume |
33 |
Issue |
15 |
Pages |
1980–1990 |
Keywords |
Structural pattern recognition; Graph embedding; Feature ranking; PCA; Graph classification |
Abstract  |
Representing a graph with a feature vector is a common way of making statistical machine learning algorithms applicable to the domain of graphs. Such a transition from graphs to vectors is known as graphembedding. A key issue in graphembedding is to select a proper set of features in order to make the vectorial representation of graphs as strong and discriminative as possible. In this article, we propose features that are constructed out of frequencies of node label representatives. We first build a large set of features and then select the most discriminative ones according to different ranking criteria and feature transformation algorithms. On different classification tasks, we experimentally show that only a small significant subset of these features is needed to achieve the same classification rates as competing to state-of-the-art methods. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ GVB2012b |
Serial |
1993 |
Permanent link to this record |
Author |
Marçal Rusiñol; Lluis Pere de las Heras; Oriol Ramos Terrades |

Title |
Flowchart Recognition for Non-Textual Information Retrieval in Patent Search |
Type |
Journal Article |
Year |
2014 |
Publication |
Information Retrieval |
Abbreviated Journal |
IR |
Volume |
17 |
Issue |
5-6 |
Pages |
545-562 |
Keywords |
Flowchart recognition; Patent documents; Text/graphics separation; Raster-to-vector conversion; Symbol recognition |
Abstract  |
Relatively little research has been done on the topic of patent image retrieval and in general in most of the approaches the retrieval is performed in terms of a similarity measure between the query image and the images in the corpus. However, systems aimed at overcoming the semantic gap between the visual description of patent images and their conveyed concepts would be very helpful for patent professionals. In this paper we present a flowchart recognition method aimed at achieving a structured representation of flowchart images that can be further queried semantically. The proposed method was submitted to the CLEF-IP 2012 flowchart recognition task. We report the obtained results on this dataset. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
1386-4564 |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.077 |
Approved |
no |
Call Number |
Admin @ si @ RHR2013 |
Serial |
2342 |
Permanent link to this record |