Publicacions CVC -- Query Results

[61–70] << 71 72 73 74 >>

Details

	Records
	Author	Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas
	Title	Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition			Type	Journal Article
	Year	2022	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	129	Issue		Pages	108766
	Keywords
	Abstract	The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios.
	Address	Sept. 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.162			Approved	no
	Call Number	Admin @ si @ KRR2022			Serial	3556
Permanent link to this record



	Author	Pau Riba; Andreas Fischer; Josep Llados; Alicia Fornes
	Title	Learning graph edit distance by graph neural networks			Type	Journal Article
	Year	2021	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	120	Issue		Pages	108132
	Keywords
	Abstract	The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine the advances on deep metric learning with traditional approximations of the graph edit distance. Hence, we propose an efficient graph distance based on the novel field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure, and thus, leveraging this information for its use on a distance computation. The performance of the proposed graph distance is validated on two different scenarios. On the one hand, in a graph retrieval of handwritten words i.e. keyword spotting, showing its superior performance when compared with (approximate) graph edit distance benchmarks. On the other hand, demonstrating competitive results for graph similarity learning when compared with the current state-of-the-art on a recent benchmark dataset.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ RFL2021			Serial	3611
Permanent link to this record



	Author	S.K. Jemni; Mohamed Ali Souibgui; Yousri Kessentini; Alicia Fornes
	Title	Enhance to Read Better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement			Type	Journal Article
	Year	2022	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	123	Issue		Pages	108370
	Keywords
	Abstract	Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover the degraded documents into a and form. Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable. To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents. Extensive experiments conducted on degraded Arabic and Latin handwritten documents demonstrate the usefulness of integrating the recognizer within the GAN architecture, which improves both the visual quality and the readability of the degraded document images. Moreover, we outperform the state of the art in H-DIBCO challenges, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images, on this task.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.124; 600.121; 602.230			Approved	no
	Call Number	Admin @ si @ JSK2022			Serial	3613
Permanent link to this record



	Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
	Title	Hierarchical multimodal transformers for Multi-Page DocVQA			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	144	Issue		Pages	109834
	Keywords
	Abstract	Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	ISSN 0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.155; 600.121			Approved	no
	Call Number	Admin @ si @ TKV2023			Serial	3825
Permanent link to this record



	Author	Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
	Title	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	139	Issue		Pages	109419
	Keywords
	Abstract	Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	ISSN 0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ BMC2023			Serial	3826
Permanent link to this record



	Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
	Title	Hierarchical multimodal transformers for Multipage DocVQA			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	144	Issue	109834	Pages
	Keywords
	Abstract	Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ TKV2023			Serial	3836
Permanent link to this record



	Author	Gemma Sanchez; Josep Llados; K. Tombre
	Title	A mean string algorithm to compute the average among a set of 2D shapes			Type	Journal Article
	Year	2002	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	23	Issue	1-3	Pages	203–214
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; IF: 0.409			Approved	no
	Call Number	DAG @ dag @ SLT2002			Serial	275
Permanent link to this record



	Author	Oriol Ramos Terrades; Ernest Valveny
	Title	A new use of the ridgelets transform for describing linear singularities in images			Type	Journal Article
	Year	2006	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	27	Issue	6	Pages	587–596
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ RaV2006a			Serial	635
Permanent link to this record



	Author	Miquel Ferrer; Ernest Valveny; F. Serratosa
	Title	Median graph: A new exact algorithm using a distance based on the maximum common subgraph			Type	Journal Article
	Year	2009	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	30	Issue	5	Pages	579–588
	Keywords
	Abstract	Median graphs have been presented as a useful tool for capturing the essential information of a set of graphs. Nevertheless, computation of optimal solutions is a very hard problem. In this work we present a new and more efficient optimal algorithm for the median graph computation. With the use of a particular cost function that permits the definition of the graph edit distance in terms of the maximum common subgraph, and a prediction function in the backtracking algorithm, we reduce the size of the search space, avoiding the evaluation of a great amount of states and still obtaining the exact median. We present a set of experiments comparing our new algorithm against the previous existing exact algorithm using synthetic data. In addition, we present the first application of the exact median graph computation to real data and we compare the results against an approximate algorithm based on genetic search. These experimental results show that our algorithm outperforms the previous existing exact algorithm and in addition show the potential applicability of the exact solutions to real problems.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier Science Inc.	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	0167-8655	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ FVS2009a			Serial	1114
Permanent link to this record



	Author	Marçal Rusiñol; Agnes Borras; Josep Llados
	Title	Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images			Type	Journal Article
	Year	2010	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	31	Issue	3	Pages	188–201
	Keywords	Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings
	Abstract	This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ RBL2010			Serial	1177
Permanent link to this record

Select All Deselect All

[61–70] << 71 72 73 74 >>

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: