Publicacions CVC -- Query Results

[51–60] << 61 62 63 64 65 66 67 68 69 70 >> [71–74]

Details

	Records
	Author	Manuel Carbonell; Joan Mas; Mauricio Villegas; Alicia Fornes; Josep Llados
	Title	End-to-End Handwritten Text Detection and Transcription in Full Pages			Type	Conference Article
	Year	2019	Publication	2nd International Workshop on Machine Learning	Abbreviated Journal
	Volume	5	Issue		Pages	29-34
	Keywords	Handwritten Text Recognition; Layout Analysis; Text segmentation; Deep Neural Networks; Multi-task learning
	Abstract	When transcribing handwritten document images, inaccuracies in the text segmentation step often cause errors in the subsequent transcription step. For this reason, some recent methods propose to perform the recognition at paragraph level. But still, errors in the segmentation of paragraphs can affect the transcription performance. In this work, we propose an end-to-end framework to transcribe full pages. The joint text detection and transcription allows to remove the layout analysis requirement at test time. The experimental results show that our approach can achieve comparable results to models that assume segmented paragraphs, and suggest that joining the two tasks brings an improvement over doing the two tasks separately.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR WML
	Notes	DAG; 600.140; 601.311; 600.140			Approved	no
	Call Number	Admin @ si @ CMV2019			Serial	3353
Permanent link to this record



	Author	Asma Bensalah; Pau Riba; Alicia Fornes; Josep Llados
	Title	Shoot less and Sketch more: An Efficient Sketch Classification via Joining Graph Neural Networks and Few-shot Learning			Type	Conference Article
	Year	2019	Publication	13th IAPR International Workshop on Graphics Recognition	Abbreviated Journal
	Volume		Issue		Pages	80-85
	Keywords	Sketch classification; Convolutional Neural Network; Graph Neural Network; Few-shot learning
	Abstract	With the emergence of the touchpad devices and drawing tablets, a new era of sketching started afresh. However, the recognition of sketches is still a tough task due to the variability of the drawing styles. Moreover, in some application scenarios there is few labelled data available for training, which imposes a limitation for deep learning architectures. In addition, in many cases there is a need to generate models able to adapt to new classes. In order to cope with these limitations, we propose a method based on few-shot learning and graph neural networks for classifying sketches aiming for an efficient neural model. We test our approach with several databases of sketches, showing promising results.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	GREC
	Notes	DAG; 600.140; 601.302; 600.121			Approved	no
	Call Number	Admin @ si @ BRF2019			Serial	3354
Permanent link to this record



	Author	Pau Riba; Anjan Dutta; Lutz Goldmann; Alicia Fornes; Oriol Ramos Terrades; Josep Llados
	Title	Table Detection in Invoice Documents by Graph Neural Networks			Type	Conference Article
	Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	122-127
	Keywords
	Abstract	Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.140; 601.302; 602.167; 600.121; 600.141			Approved	no
	Call Number	Admin @ si @ RDG2019			Serial	3355
Permanent link to this record



	Author	Ekta Vats; Anders Hast; Alicia Fornes
	Title	Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion			Type	Conference Article
	Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1294-1299
	Keywords	Word spotting; Segmentation-free; Trainingfree; Query expansion; Feature matching
	Abstract	Historical handwritten text recognition is an interesting yet challenging problem. In recent times, deep learning based methods have achieved significant performance in handwritten text recognition. However, handwriting recognition using deep learning needs training data, and often, text must be previously segmented into lines (or even words). These limitations constrain the application of HTR techniques in document collections, because training data or segmented words are not always available. Therefore, this paper proposes a training-free and segmentation-free word spotting approach that can be applied in unconstrained scenarios. The proposed word spotting framework is based on document query word expansion and relaxed feature matching algorithm, which can easily be parallelised. Since handwritten words posses distinct shape and characteristics, this work uses a combination of different keypoint detectors and Fourier-based descriptors to obtain a sufficient degree of relaxed matching. The effectiveness of the proposed method is empirically evaluated on well-known benchmark datasets using standard evaluation measures. The use of informative features along with query expansion significantly contributed in efficient performance of the proposed method.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ VHF2019			Serial	3356
Permanent link to this record



	Author	Arka Ujjal Dey; Suman Ghosh; Ernest Valveny; Gaurav Harit
	Title	Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding			Type	Journal Article
	Year	2021	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
	Volume	149	Issue		Pages	164-171
	Keywords
	Abstract	Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We do not only extract and encode visual and scene text cues, but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images, with scene text content, to demonstrate its effectiveness. In the retrieval framework, we augment our learned text-visual semantic representation with scene text cues, to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous recognition of scene text, we also apply query-based attention to our text channel. We show how the multi-channel approach, involving visual semantics and scene text, improves upon state of the art.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ DGV2021			Serial	3364
Permanent link to this record



	Author	Mohammed Al Rawi; Ernest Valveny
	Title	Compact and Efficient Multitask Learning in Vision, Language and Speech			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2933-2942
	Keywords
	Abstract	Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ RaV2019			Serial	3365
Permanent link to this record



	Author	Juan Ignacio Toledo
	Title	Information Extraction from Heterogeneous Handwritten Documents			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this thesis we explore information Extraction from totally or partially handwritten documents. Basically we are dealing with two different application scenarios. The first scenario are modern highly structured documents like forms. In this kind of documents, the semantic information is encoded in different fields with a pre-defined location in the document, therefore, information extraction becomes roughly equivalent to transcription. The second application scenario are loosely structured totally handwritten documents, besides transcribing them, we need to assign a semantic label, from a set of known values to the handwritten words. In both scenarios, transcription is an important part of the information extraction. For that reason in this thesis we present two methods based on Neural Networks, to transcribe handwritten text.In order to tackle the challenge of loosely structured documents, we have produced a benchmark, consisting of a dataset, a defined set of tasks and a metric, that was presented to the community as an international competition. Also, we propose different models based on Convolutional and Recurrent neural networks that are able to transcribe and assign different semantic labels to each handwritten words, that is, able to perform Information Extraction.
	Address	July 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Alicia Fornes;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-7-3	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ Tol2019			Serial	3389
Permanent link to this record



	Author	Albert Berenguel
	Title	Analysis of background textures in banknotes and identity documents for counterfeit detection			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
	Address	November 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Oriol Ramos Terrades;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-2-6	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ Ber2019			Serial	3395
Permanent link to this record



	Author	Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
	Title	RoadText-1K: Text Detection and Recognition Dataset for Driving Videos			Type	Conference Article
	Year	2020	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/ projects/cvit-projects/roadtext-1k
	Address	Paris; Francia; ???
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICRA
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ RMG2020			Serial	3400
Permanent link to this record



	Author	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
	Title	Location Sensitive Image Retrieval and Tagging			Type	Conference Article
	Year	2020	Publication	16th European Conference on Computer Vision	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
	Address	Virtual; August 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCV
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ GGG2020b			Serial	3420
Permanent link to this record

Select All Deselect All

[51–60] << 61 62 63 64 65 66 67 68 69 70 >> [71–74]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: