Publicacions CVC -- Query Results

[41–50] << 51 52 53 54 55 56 57 58 59 60 >> [61–70]

Details

	Records
	Author	Ruben Perez Tito
	Title	Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents			Type	Book Whole
	Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents. In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Ernest Valveny
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-5-5	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Per2023			Serial	3967
Permanent link to this record



	Author	Marçal Rusiñol; Josep Llados
	Title	Logo Spotting by a Bag-of-words Approach for Document Categorization			Type	Conference Article
	Year	2009	Publication	10th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	111–115
	Keywords
	Abstract	In this paper we present a method for document categorization which processes incoming document images such as invoices or receipts. The categorization of these document images is done in terms of the presence of a certain graphical logo detected without segmentation. The graphical logos are described by a set of local features and the categorization of the documents is performed by the use of a bag-of-words model. Spatial coherence rules are added to reinforce the correct category hypothesis, aiming also to spot the logo inside the document image. Experiments which demonstrate the effectiveness of this system on a large set of real data are presented.
	Address	Barcelona; Spain
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1520-5363	ISBN	978-1-4244-4500-4	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ RuL2009b			Serial	1179
Permanent link to this record



	Author	Veronica Romero; Emilio Granell; Alicia Fornes; Enrique Vidal; Joan Andreu Sanchez
	Title	Information Extraction in Handwritten Marriage Licenses Books			Type	Conference Article
	Year	2019	Publication	5th International Workshop on Historical Document Imaging and Processing	Abbreviated Journal
	Volume		Issue		Pages	66-71
	Keywords
	Abstract	Handwritten marriage licenses books are characterized by a simple structure of the text in the records with an evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. Previous works have shown that the use of category-based language models and a Grammatical Inference technique known as MGGI can improve the accuracy of these tasks. However, the application of the MGGI algorithm requires an a priori knowledge to label the words of the training strings, that is not always easy to obtain. In this paper we study how to automatically obtain the information required by the MGGI algorithm using a technique based on Confusion Networks. Using the resulting language model, full handwritten text recognition and information extraction experiments have been carried out with results supporting the proposed approach.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	HIP
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ RGF2019			Serial	3352
Permanent link to this record



	Author	Antonio Clavelli; Dimosthenis Karatzas
	Title	Text Segmentation in Colour Posters from the Spanish Civil War Era			Type	Conference Article
	Year	2009	Publication	10th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	181 - 185
	Keywords
	Abstract	The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existence of complex background graphics with repetitive patterns can make its localization and segmentation extremely difficult. Here, we propose a new method for extracting textual content from such colour images that makes no assumption as to the size of the characters, their orientation or colour, while it is tolerant to characters that do not follow a straight baseline. We evaluate this method on a collection of documents with historical connotations: the Posters from the Spanish Civil War.
	Address	Barcelona, Spain
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1520-5363	ISBN	978-1-4244-4500-4	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ ClK2009			Serial	1172
Permanent link to this record



	Author	Partha Pratim Roy; Umapada Pal; Josep Llados
	Title	Seal Object Detection in Document Images using GHT of Local Component Shapes			Type	Conference Article
	Year	2010	Publication	10th ACM Symposium On Applied Computing	Abbreviated Journal
	Volume		Issue		Pages	23–27
	Keywords
	Abstract	Due to noise, overlapped text/signature and multi-oriented nature, seal (stamp) object detection involves a difficult challenge. This paper deals with automatic detection of seal from documents with cluttered background. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors (distance and angular position) computed from recognition result of individual connected components (characters). Recognition of multi-scale and multi-oriented component is done using Support Vector Machine classifier. Generalized Hough Transform (GHT) is used to detect the seal and a voting is casted for finding possible location of the seal object in a document based on these spatial feature descriptor of components pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal object in a document. Experimental results show that, the method is efficient to locate seal instance of arbitrary shape and orientation in documents.
	Address	Sierre, Switzerland
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	SAC
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ RPL2010a			Serial	1291
Permanent link to this record



	Author	Sergio Escalera; Alicia Fornes; Oriol Pujol; Alberto Escudero; Petia Radeva
	Title	Circular Blurred Shape Model for Symbol Spotting in Documents			Type	Conference Article
	Year	2009	Publication	16th IEEE International Conference on Image Processing	Abbreviated Journal
	Volume		Issue		Pages	1985-1988
	Keywords
	Abstract	Symbol spotting problem requires feature extraction strategies able to generalize from training samples and to localize the target object while discarding most part of the image. In the case of document analysis, symbol spotting techniques have to deal with a high variability of symbols' appearance. In this paper, we propose the Circular Blurred Shape Model descriptor. Feature extraction is performed capturing the spatial arrangement of significant object characteristics in a correlogram structure. Shape information from objects is shared among correlogram regions, being tolerant to the irregular deformations. Descriptors are learnt using a cascade of classifiers and Abadoost as the base classifier. Finally, symbol spotting is performed by means of a windowing strategy using the learnt cascade over plan and old musical score documents. Spotting and multi-class categorization results show better performance comparing with the state-of-the-art descriptors.
	Address	Cairo, Egypt
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-4244-5653-6	Medium
	Area		Expedition		Conference	ICIP
	Notes	MILAB;HuPBA;DAG			Approved	no
	Call Number	BCNPCL @ bcnpcl @ EFP2009b			Serial	1184
Permanent link to this record



	Author	Ernest Valveny; Enric Marti
	Title	Application of deformable template matching to symbol recognition in hand-written architectural draw			Type	Conference Article
	Year	1999	Publication	Proceedings of the Fifth International Conference on	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	We propose to use deformable template matching as a new approach to recognize characters and lineal symbols in hand-written line drawings, instead of traditional methods based on vectorization and feature extraction. Bayesian formulation of the deformable template matching allows combining fidelity to the ideal shape of the symbol with maximum flexibility to get the best fit to the input image. Lineal nature of symbols can be exploited to define a suitable representation of models and the set of deformations to be applied to them. Matching, however, is done over the original binary image to avoid losing relevant features during vectorization. We have applied this method to hand-written architectural drawings and experimental results demonstrate that symbols with high distortions from ideal shape can be accurately identified.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication	Bangalore (India)	Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG;IAM;			Approved	no
	Call Number	IAM @ iam @ VAM1999a			Serial	1657
Permanent link to this record



	Author	Jon Almazan; Alicia Fornes; Ernest Valveny
	Title	A Non-Rigid Feature Extraction Method for Shape Recognition			Type	Conference Article
	Year	2011	Publication	11th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	987-991
	Keywords
	Abstract	This paper presents a methodology for shape recognition that focuses on dealing with the difficult problem of large deformations. The proposed methodology consists in a novel feature extraction technique, which uses a non-rigid representation adaptable to the shape. This technique employs a deformable grid based on the computation of geometrical centroids that follows a region partitioning algorithm. Then, a feature vector is extracted by computing pixel density measures around these geometrical centroids. The result is a shape descriptor that adapts its representation to the given shape and encodes the pixel density distribution. The validity of the method when dealing with large deformations has been experimentally shown over datasets composed of handwritten shapes. It has been applied to signature verification and shape recognition tasks demonstrating high accuracy and low computational cost.
	Address	Beijing; China; September 2011
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-0-7695-4520-2	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ AFV2011			Serial	1763
Permanent link to this record



	Author	Lluis Pere de las Heras; Joan Mas; Gemma Sanchez; Ernest Valveny
	Title	Wall Patch-Based Segmentation in Architectural Floorplans			Type	Conference Article
	Year	2011	Publication	11th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1270-1274
	Keywords
	Abstract	Segmentation of architectural floor plans is a challenging task, mainly because of the large variability in the notation between different plans. In general, traditional techniques, usually based on analyzing and grouping structural primitives obtained by vectorization, are only able to handle a reduced range of similar notations. In this paper we propose an alternative patch-based segmentation approach working at pixel level, without need of vectorization. The image is divided into a set of patches and a set of features is extracted for every patch. Then, each patch is assigned to a visual word of a previously learned vocabulary and given a probability of belonging to each class of objects. Finally, a post-process assigns the final label for every pixel. This approach has been applied to the detection of walls on two datasets of architectural floor plans with different notations, achieving high accuracy rates.
	Address	Beiging, China
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1520-5363	ISBN	978-0-7695-4520-2	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ HMS2011a			Serial	1792
Permanent link to this record



	Author	Miquel Ferrer; Ernest Valveny; F. Serratosa; K. Riesen; Horst Bunke
	Title	Generalized Median Graph Computation by Means of Graph Embedding in Vector Spaces			Type	Journal Article
	Year	2010	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	43	Issue	4	Pages	1642–1655
	Keywords	Graph matching; Weighted mean of graphs; Median graph; Graph embedding; Vector spaces
	Abstract	The median graph has been presented as a useful tool to represent a set of graphs. Nevertheless its computation is very complex and the existing algorithms are restricted to use limited amount of data. In this paper we propose a new approach for the computation of the median graph based on graph embedding. Graphs are embedded into a vector space and the median is computed in the vector domain. We have designed a procedure based on the weighted mean of a pair of graphs to go from the vector domain back to the graph domain in order to obtain a final approximation of the median graph. Experiments on three different databases containing large graphs show that we succeed to compute good approximations of the median graph. We have also applied the median graph to perform some basic classification tasks achieving reasonable good results. These experiments on real data open the door to the application of the median graph to a number of more complex machine learning algorithms where a representative of a set of graphs is needed.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ FVS2010			Serial	1294
Permanent link to this record

Select All Deselect All

[41–50] << 51 52 53 54 55 56 57 58 59 60 >> [61–70]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: