Publicacions CVC -- Query Results

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

Details

	Records
	Author	Francesc Net; Marc Folia; Pep Casals; Lluis Gomez
	Title	Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections			Type	Conference Article
	Year	2023	Publication	17th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume	14191	Issue		Pages	3-17
	Keywords	Image deduplication; Near-duplicate images detection; Transductive Learning; Photographic Archives; Deep Learning
	Abstract	This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
	Address	San Jose; CA; USA; August 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ NFC2023			Serial	3859
Permanent link to this record



	Author	Farshad Nourbakhsh
	Title	Colour logo recognition			Type	Report
	Year	2009	Publication	CVC Technical Report	Abbreviated Journal
	Volume	145	Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author	Computer Vision Center			Thesis	Master's thesis
	Publisher		Place of Publication	Bellaterra, Barcelona	Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Nou2009			Serial	2399
Permanent link to this record



	Author	Nibal Nayef; Yash Patel; Michal Busta; Pinaki Nath Chowdhury; Dimosthenis Karatzas; Wafa Khlif; Jiri Matas; Umapada Pal; Jean-Christophe Burie; Cheng-lin Liu; Jean-Marc Ogier
	Title	ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019			Type	Conference Article
	Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1582-1587
	Keywords
	Abstract	With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ NPB2019			Serial	3341
Permanent link to this record



	Author	T.O. Nguyen; Salvatore Tabbone; Oriol Ramos Terrades
	Title	Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval			Type	Conference Article
	Year	2008	Publication	Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,	Abbreviated Journal
	Volume		Issue		Pages	191-197
	Keywords
	Abstract
	Address	Nara, Japan
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ NTR2008a			Serial	1873
Permanent link to this record



	Author	T.O. Nguyen; Salvatore Tabbone; Oriol Ramos Terrades; A.T. Thierry
	Title	Proposition d'un descripteur de formes et du modèle vectoriel pour la recherche de symboles			Type	Conference Article
	Year	2008	Publication	Colloque International Francophone sur l'Ecrit et le Document	Abbreviated Journal
	Volume		Issue		Pages	79-84
	Keywords
	Abstract
	Address	Rouen, France
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CIFED
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ NTR2008b			Serial	1875
Permanent link to this record



	Author	N. Nayef; F. Yin; I. Bizid; H .Choi; Y. Feng; Dimosthenis Karatzas; Z. Luo; Umapada Pal; Christophe Rigaud; J. Chazalon; W. Khlif; Muhammad Muzzamil Luqman; Jean-Christophe Burie; C.L. Liu; Jean-Marc Ogier
	Title	ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification – RRC-MLT			Type	Conference Article
	Year	2017	Publication	14th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1454-1459
	Keywords
	Abstract	Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and script classification. We have received a total of 16 participations from the research and industrial communities. This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge.
	Address	Kyoto; Japan; November 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-5386-3586-5	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ NYB2017			Serial	3097
Permanent link to this record



	Author	Jean-Marc Ogier; Wenyin Liu; Josep Llados (eds)
	Title	Graphics Recognition: Achievements, Challenges, and Evolution			Type	Book Whole
	Year	2010	Publication	8th International Workshop GREC 2009.	Abbreviated Journal
	Volume	6020	Issue		Pages
	Keywords
	Abstract
	Address	La Rochelle
	Corporate Author				Thesis
	Publisher	Springer Link	Place of Publication		Editor	Jean-Marc Ogier; Wenyin Liu; Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title	Lecture Notes in Computer Science	Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-642-13727-3	Medium
	Area		Expedition		Conference	GREC
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ OLL2010			Serial	1976
Permanent link to this record



	Author	Ruben Perez Tito
	Title	Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents			Type	Book Whole
	Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents. In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Ernest Valveny
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-5-5	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Per2023			Serial	3967
Permanent link to this record



	Author	Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Anna Cabre
	Title	Bridging the gap between historical demography and computing: tools for computer-assisted transcription and the analysis of demographic sources			Type	Book Chapter
	Year	2016	Publication	The future of historical demography. Upside down and inside out	Abbreviated Journal
	Volume		Issue		Pages	127-131
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Acco Publishers	Place of Publication		Editor	K.Matthijs; S.Hin; H.Matsuo; J.Kok
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-94-6292-722-3	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097			Approved	no
	Call Number	Admin @ si @ PFL2016			Serial	2907
Permanent link to this record



	Author	Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Gabriel Brea-Martinez; Miquel Valls-Figols
	Title	The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries			Type	Book Chapter
	Year	2019	Publication	Nominative Data in Demographic Research in the East and the West: monograph	Abbreviated Journal
	Volume		Issue		Pages	29-61
	Keywords
	Abstract	The Baix Llobregat (BALL) Demographic Database is an ongoing database project containing individual census data from the Catalan region of Baix Llobregat (Spain) during the nineteenth and twentieth centuries. The BALL Database is built within the project ‘NETWORKS: Technology and citizen innovation for building historical social networks to understand the demographic past’ directed by Alícia Fornés from the Center for Computer Vision and Joana Maria Pujadas-Mora from the Center for Demographic Studies, both at the Universitat Autònoma de Barcelona, funded by the Recercaixa program (2017–2019). Its webpage is http://dag.cvc.uab.es/xarxes/.The aim of the project is to develop technologies facilitating massive digitalization of demographic sources, and more specifically the padrones (local censuses), in order to reconstruct historical ‘social’ networks employing computer vision technology. Such virtual networks can be created thanks to the linkage of nominative records compiled in the local censuses across time and space. Thus, digitized versions of individual and family lifespans are established, and individuals and families can be located spatially.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-5-7996-2656-3	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ PFL2019			Serial	3351
Permanent link to this record

Select All Deselect All

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: