Publicacions CVC -- Query Results

[11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40]

Details

	Records
	Author	Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades
	Title	Noise suppression over bi-level graphical documents using a sparse representation			Type	Conference Article
	Year	2012	Publication	Colloque International Francophone sur l'Écrit et le Document	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Bordeaux
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CIFED
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ DTR2012b			Serial	2136
Permanent link to this record



	Author	Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades
	Title	New Approach for Symbol Recognition Combining Shape Context of Interest Points with Sparse Representation			Type	Conference Article
	Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	265-269
	Keywords
	Abstract	In this paper, we propose a new approach for symbol description. Our method is built based on the combination of shape context of interest points descriptor and sparse representation. More specifically, we first learn a dictionary describing shape context of interest point descriptors. Then, based on information retrieval techniques, we build a vector model for each symbol based on its sparse representation in a visual vocabulary whose visual words are columns in the learneddictionary. The retrieval task is performed by ranking symbols based on similarity between vector models. Evaluation of our method, using benchmark datasets, demonstrates the validity of our approach and shows that it outperforms related state-of-theart methods.
	Address	Washington; USA; August 2013
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1520-5363	ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ DTR2013b			Serial	2331
Permanent link to this record



	Author	Manuel Carbonell
	Title	Neural Information Extraction from Semi-structured Documents A			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Sectors as fintech, legaltech or insurance process an inflow of millions of forms, invoices, id documents, claims or similar every day. Together with these, historical archives provide gigantic amounts of digitized documents containing useful information that needs to be stored in machine encoded text with a meaningful structure. This procedure, known as information extraction (IE) comprises the steps of localizing and recognizing text, identifying named entities contained in it and optionally finding relationships among its elements. In this work we explore multi-task neural models at image and graph level to solve all steps in a unified way. While doing so we find benefits and limitations of these end-to-end approaches in comparison with sequential separate methods. More specifically, we first propose a method to produce textual as well as semantic labels with a unified model from handwritten text line images. We do so with the use of a convolutional recurrent neural model trained with connectionist temporal classification to predict the textual as well as semantic information encoded in the images. Secondly, motivated by the success of this approach we investigate the unification of the localization and recognition tasks of handwritten text in full pages with an end-to-end model, observing benefits in doing so. Having two models that tackle information extraction subsequent task pairs in an end-to-end to end manner, we lastly contribute with a method to put them all together in a single neural network to solve the whole information extraction pipeline in a unified way. Doing so we observe some benefits and some limitations in the approach, suggesting that in certain cases it is beneficial to train specialized models that excel at a single challenging task of the information extraction process, as it can be the recognition of named entities or the extraction of relationships between them. For this reason we lastly study the use of the recently arrived graph neural network architectures for the semantic tasks of the information extraction process, which are recognition of named entities and relation extraction, achieving promising results on the relation extraction part.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Alicia Fornes;Mauricio Villegas;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-1-6	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Car20			Serial	3483
Permanent link to this record



	Author	Anjan Dutta; Josep Llados; Horst Bunke; Umapada Pal
	Title	Near Convex Region Adjacency Graph and Approximate Neighborhood String Matching for Symbol Spotting in Graphical Documents			Type	Conference Article
	Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1078-1082
	Keywords
	Abstract	This paper deals with a subgraph matching problem in Region Adjacency Graph (RAG) applied to symbol spotting in graphical documents. RAG is a very important, efﬁcient and natural way of representing graphical information with a graph but this is limited to cases where the information is well deﬁned with perfectly delineated regions. What if the information we are interested in is not conﬁned within well deﬁned regions? This paper addresses this particular problem and solves it by deﬁning near convex grouping of oriented line segments which results in near convex regions. Pure convexity imposes hard constraints and can not handle all the cases efﬁciently. Hence to solve this problem we have deﬁned a new type of convexity of regions, which allows convex regions to have concavity to some extend. We call this kind of regions Near Convex Regions (NCRs). These NCRs are then used to create the Near Convex Region Adjacency Graph (NCRAG) and with this representation we have formulated the problem of symbol spotting in graphical documents as a subgraph matching problem. For subgraph matching we have used the Approximate Edit Distance Algorithm (AEDA) on the neighborhood string, which starts working after ﬁnding a key node in the input or target graph and iteratively identiﬁes similar nodes of the query graph in the neighborhood of the key node. The experiments are performed on artiﬁcial, real and distorted datasets.
	Address	Washington; USA; August 2013
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1520-5363	ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.045; 600.056; 600.061; 601.152			Approved	no
	Call Number	Admin @ si @ DLB2013a			Serial	2358
Permanent link to this record



	Author	Manuel Carbonell; Pau Riba; Mauricio Villegas; Alicia Fornes; Josep Llados
	Title	Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents			Type	Conference Article
	Year	2020	Publication	25th International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The use of administrative documents to communicate and leave record of business information requires of methods able to automatically extract and understand the content from such documents in a robust and efficient way. In addition, the semi-structured nature of these reports is specially suited for the use of graph-based representations which are flexible enough to adapt to the deformations from the different document templates. Moreover, Graph Neural Networks provide the proper methodology to learn relations among the data elements in these documents. In this work we study the use of Graph Neural Network architectures to tackle the problem of entity recognition and relation extraction in semi-structured documents. Our approach achieves state of the art results in the three tasks involved in the process. Additionally, the experimentation with two datasets of different nature demonstrates the good generalization ability of our approach.
	Address	Virtual; January 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ CRV2020			Serial	3509
Permanent link to this record



	Author	Emanuele Vivoli; Ali Furkan Biten; Andres Mafla; Dimosthenis Karatzas; Lluis Gomez
	Title	MUST-VQA: MUltilingual Scene-text VQA			Type	Conference Article
	Year	2022	Publication	Proceedings European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	13804	Issue		Pages	345–358
	Keywords	Visual question answering; Scene text; Translation robustness; Multilingual models; Zero-shot transfer; Power of language models
	Abstract	In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks.
	Address	Tel-Aviv; Israel; October 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 302.105; 600.155; 611.002			Approved	no
	Call Number	Admin @ si @ VBM2022			Serial	3770
Permanent link to this record



	Author	Arnau Baro; Pau Riba; Alicia Fornes
	Title	Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network			Type	Conference Article
	Year	2022	Publication	Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022)	Abbreviated Journal
	Volume	13639	Issue		Pages	171-184
	Keywords	Object detection; Optical music recognition; Graph neural network
	Abstract	During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results.
	Address	December 04 – 07, 2022; Hyderabad, India
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICFHR
	Notes	DAG; 600.162; 600.140; 602.230			Approved	no
	Call Number	Admin @ si @ BRF2022b			Serial	3740
Permanent link to this record



	Author	Jaume Gibert; Ernest Valveny; Oriol Ramos Terrades; Horst Bunke
	Title	Multiple Classifiers for Graph of Words Embedding			Type	Conference Article
	Year	2011	Publication	10th International Conference on Multiple Classifier Systems	Abbreviated Journal
	Volume	6713	Issue		Pages	36-45
	Keywords
	Abstract	During the last years, there has been an increasing interest in applying the multiple classifier framework to the domain of structural pattern recognition. Constructing base classifiers when the input patterns are graph based representations is not an easy problem. In this work, we make use of the graph embedding methodology in order to construct different feature vector representations for graphs. The graph of words embedding assigns a feature vector to every graph by counting unary and binary relations between node representatives and combining these pieces of information into a single vector. Selecting different node representatives leads to different vectorial representations and therefore to different base classifiers that can be combined. We experimentally show how this methodology significantly improves the classification of graphs with respect to single base classifiers.
	Address	Napoles, Italy
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Carlo Sansone; Josef Kittler; Fabio Roli
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-642-21556-8	Medium
	Area		Expedition		Conference	MCS
	Notes	DAG			Approved	no
	Call Number	Admin @ si @GVR2011			Serial	1745
Permanent link to this record



	Author	Marçal Rusiñol; Dimosthenis Karatzas; Andrew Bagdanov; Josep Llados
	Title	Multipage Document Retrieval by Textual and Visual Representations			Type	Conference Article
	Year	2012	Publication	21st International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	521-524
	Keywords
	Abstract	In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
	Address	Tsukuba Science City, Japan
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1051-4651	ISBN	978-1-4673-2216-4	Medium
	Area		Expedition		Conference	ICPR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ RKB2012			Serial	2053
Permanent link to this record



	Author	Marçal Rusiñol; Volkmar Frinken; Dimosthenis Karatzas; Andrew Bagdanov; Josep Llados
	Title	Multimodal page classification in administrative document image streams			Type	Journal Article
	Year	2014	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	17	Issue	4	Pages	331-341
	Keywords	Digital mail room; Multimodal page classification; Visual and textual document description
	Abstract	In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1433-2833	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; LAMP; 600.056; 600.061; 601.240; 601.223; 600.077; 600.079			Approved	no
	Call Number	Admin @ si @ RFK2014			Serial	2523
Permanent link to this record

Select All Deselect All

[11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: