Publicacions CVC -- Query Results

[31–40] << 41 42 43 44 45 46 47 48 49 50 >> [51–60]

Details

	Records
	Author	Andrea Gemelli; Sanket Biswas; Enrico Civitelli; Josep Llados; Simone Marinai
	Title	Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks			Type	Conference Article
	Year	2022	Publication	17th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	13804	Issue		Pages	329–344
	Keywords
	Abstract	Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-031-25068-2	Medium
	Area		Expedition		Conference	ECCV-TiE
	Notes	DAG; 600.162; 600.140; 110.312			Approved	no
	Call Number	Admin @ si @ GBC2022			Serial	3795
Permanent link to this record



	Author	Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas
	Title	Dynamic Lexicon Generation for Natural Scene Images			Type	Conference Article
	Year	2016	Publication	14th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	395-410
	Keywords	scene text; photo OCR; scene understanding; lexicon generation; topic modeling; CNN
	Abstract	Many scene text understanding methods approach the endtoend recognition problem from a word-spotting perspective and take huge benet from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed. In this paper we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a xed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline.
	Address	Amsterdam; The Netherlands; October 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.084			Approved	no
	Call Number	Admin @ si @ PGR2016			Serial	2825
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Learning to Learn from Web Data through Deep Semantic Embeddings			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	11134	Issue		Pages	514-529
	Keywords
	Abstract	In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
	Address	Munich; Alemanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.129; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ GGG2018a			Serial	3175
Permanent link to this record



	Author	Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov
	Title	Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images			Type	Conference Article
	Year	2018	Publication	International Workshop on Egocentric Perception, Interaction and Computing at ECCV	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Word spotting in natural scene images has many applications in scene understanding and visual assistance. We propose Soft-PHOC, an intermediate representation of images based on character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We show how to use our descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.
	Address	Munich; Alemanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.129; 600.121;			Approved	no
	Call Number	Admin @ si @ BKB2018b			Serial	3174
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	11134	Issue		Pages	530-544
	Keywords
	Abstract	Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.
	Address	Munich; Alemanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.129; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ GGG2018b			Serial	3176
Permanent link to this record



	Author	Emanuele Vivoli; Ali Furkan Biten; Andres Mafla; Dimosthenis Karatzas; Lluis Gomez
	Title	MUST-VQA: MUltilingual Scene-text VQA			Type	Conference Article
	Year	2022	Publication	Proceedings European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	13804	Issue		Pages	345–358
	Keywords	Visual question answering; Scene text; Translation robustness; Multilingual models; Zero-shot transfer; Power of language models
	Abstract	In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks.
	Address	Tel-Aviv; Israel; October 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 302.105; 600.155; 611.002			Approved	no
	Call Number	Admin @ si @ VBM2022			Serial	3770
Permanent link to this record



	Author	Sergi Garcia Bordils; Andres Mafla; Ali Furkan Biten; Oren Nuriel; Aviad Aberdam; Shai Mazor; Ron Litman; Dimosthenis Karatzas
	Title	Out-of-Vocabulary Challenge Report			Type	Conference Article
	Year	2022	Publication	Proceedings European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	13804	Issue		Pages	359–375
	Keywords
	Abstract	This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character Recognition (OCR) models, namely, the recognition of unseen scene text instances at training time. The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances, thus covering a wide range of data distributions. A new and independent validation and test set is formed with scene text instances that are out of vocabulary at training time. The competition was structured in two tasks, end-to-end and cropped scene text recognition respectively. A thorough analysis of results from baselines and different participants is presented. Interestingly, current state-of-the-art models show a significant performance gap under the newly studied setting. We conclude that the OOV dataset proposed in this challenge will be an essential area to be explored in order to develop scene text models that achieve more robust and generalized predictions.
	Address	Tel-Aviv; Israel; October 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.155; 302.105; 611.002			Approved	no
	Call Number	Admin @ si @ GMB2022			Serial	3771
Permanent link to this record



	Author	Marçal Rusiñol; David Aldavert; Dimosthenis Karatzas; Ricardo Toledo; Josep Llados
	Title	Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content. Advances in Information Retrieval			Type	Conference Article
	Year	2011	Publication	33rd European Conference on Information Retrieval	Abbreviated Journal
	Volume	6611	Issue		Pages	314-325
	Keywords
	Abstract	In this paper we propose an efficient queried-by-example retrieval system which is able to retrieve trademark images by similarity from patent and trademark offices' digital libraries. Logo images are described by both their semantic content, by means of the Vienna codes, and their visual contents, by using shape and color as visual cues. The trademark descriptors are then indexed by a locality-sensitive hashing data structure aiming to perform approximate k-NN search in high dimensional spaces in sub-linear time. The resulting ranked lists are combined by using the Condorcet method and a relevance feedback step helps to iteratively revise the query and refine the obtained results. The experiments demonstrate the effectiveness and efficiency of this system on a realistic and large dataset.
	Address	Dublin, Ireland
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication	Berlin	Editor	P. Clough; C. Foley; C. Gurrin; G.J.F. Jones; W. Kraaij; H. Lee; V. Murdoch
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-642-20160-8	Medium
	Area		Expedition		Conference	ECIR
	Notes	DAG; RV;ADAS			Approved	no
	Call Number	Admin @ si @ RAK2011			Serial	1737
Permanent link to this record



	Author	Mohammed Al Rawi; Dimosthenis Karatzas
	Title	On the Labeling Correctness in Computer Vision Datasets			Type	Conference Article
	Year	2018	Publication	Proceedings of the Workshop on Interactive Adaptive Learning, co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Image datasets have heavily been used to build computer vision systems. These datasets are either manually or automatically labeled, which is a problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECML-PKDDW
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ RaK2018			Serial	3144
Permanent link to this record



	Author	Fernando Vilariño; Dimosthenis Karatzas
	Title	A Living Lab approach for Citizen Science in Libraries			Type	Conference Article
	Year	2016	Publication	1st International ECSA Conference	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Berlin; Germany; May 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECSA
	Notes	MV; DAG; 600.084; 600.097;SIAI			Approved	no
	Call Number	Admin @ si @ViK2016			Serial	2804
Permanent link to this record

Select All Deselect All

[31–40] << 41 42 43 44 45 46 47 48 49 50 >> [51–60]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: