Publicacions CVC -- Query Results

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–27]

Details

	Records
	Author	Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
	Title	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	139	Issue		Pages	109419
	Keywords
	Abstract	Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	ISSN 0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ BMC2023			Serial	3826
Permanent link to this record



	Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
	Title	Hierarchical multimodal transformers for Multipage DocVQA			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	144	Issue	109834	Pages
	Keywords
	Abstract	Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ TKV2023			Serial	3836
Permanent link to this record



	Author	Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal
	Title	SemiDocSeg: Harnessing Semi-Supervised Learning for Document Layout Analysis			Type	Journal Article
	Year	2024	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume		Issue		Pages
	Keywords	Document layout analysis; Semi-supervised learning; Co-Occurrence matrix; Instance segmentation; Swin transformer
	Abstract	Document Layout Analysis (DLA) is the process of automatically identifying and categorizing the structural components (e.g. Text, Figure, Table, etc.) within a document to extract meaningful content and establish the page's layout structure. It is a crucial stage in document parsing, contributing to their comprehension. However, traditional DLA approaches often demand a significant volume of labeled training data, and the labor-intensive task of generating high-quality annotated training data poses a substantial challenge. In order to address this challenge, we proposed a semi-supervised setting that aims to perform learning on limited annotated categories by eliminating exhaustive and expensive mask annotations. The proposed setting is expected to be generalizable to novel categories as it learns the underlying positional information through a support set and class information through Co-Occurrence that can be generalized from annotated categories to novel categories. Here, we first extract features from the input image and support set with a shared multi-scale feature acquisition backbone. Then, the extracted feature representation is fed to the transformer encoder as a query. Later on, we utilize a semantic embedding network before the decoder to capture the underlying semantic relationships and similarities between different instances, enabling the model to make accurate predictions or classifications with only a limited amount of labeled data. Extensive experimentation on competitive benchmarks like PRIMA, DocLayNet, and Historical Japanese (HJ) demonstrate that this generalized setup obtains significant performance compared to the conventional supervised approach.
	Address	June 2024
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ BBL2024a			Serial	4001
Permanent link to this record



	Author	Mohamed Ali Souibgui; Asma Bensalah; Jialuo Chen; Alicia Fornes; Michelle Waldispühl
	Title	A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus Just Accepted			Type	Journal Article
	Year	2023	Publication	ACM Journal on Computing and Cultural Heritage	Abbreviated Journal	JOCCH
	Volume	15	Issue	4	Pages	1-18
	Keywords
	Abstract	Recent breakthroughs in Artificial Intelligence, Deep Learning and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraint. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare and analyse different transcription methods for rare scripts. We evaluate their performance in a real use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.
	Address
	Corporate Author				Thesis
	Publisher	ACM	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
	Call Number	Admin @ si @ SBC2023			Serial	3732
Permanent link to this record



	Author	Marçal Rusiñol; Agnes Borras; Josep Llados
	Title	Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images			Type	Journal Article
	Year	2010	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	31	Issue	3	Pages	188–201
	Keywords	Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings
	Abstract	This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ RBL2010			Serial	1177
Permanent link to this record

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–27]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: