Publicacions CVC -- Query Results

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

Details

	Records
	Author	Albert Gordo; Alicia Fornes; Ernest Valveny
	Title	Writer identification in handwritten musical scores with bags of notes			Type	Journal Article
	Year	2013	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	46	Issue	5	Pages	1337-1345
	Keywords
	Abstract	Writer Identification is an important task for the automatic processing of documents. However, the identification of the writer in graphical documents is still challenging. In this work, we adapt the Bag of Visual Words framework to the task of writer identification in handwritten musical scores. A vanilla implementation of this method already performs comparably to the state-of-the-art. Furthermore, we analyze the effect of two improvements of the representation: a Bhattacharyya embedding, which improves the results at virtually no extra cost, and a Fisher Vector representation that very significantly improves the results at the cost of a more complex and costly representation. Experimental evaluation shows results more than 20 points above the state-of-the-art in a new, challenging dataset.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ GFV2013			Serial	2307
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Learning to Learn from Web Data through Deep Semantic Embeddings			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	11134	Issue		Pages	514-529
	Keywords
	Abstract	In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
	Address	Munich; Alemanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.129; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ GGG2018a			Serial	3175
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods			Type	Conference Article
	Year	2018	Publication	15th European Conference on Computer Vision Workshops	Abbreviated Journal
	Volume	11134	Issue		Pages	530-544
	Keywords
	Abstract	Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.
	Address	Munich; Alemanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCVW
	Notes	DAG; 600.129; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ GGG2018b			Serial	3176
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Self-Supervised Learning from Web Data for Multimodal Retrieval			Type	Book Chapter
	Year	2019	Publication	Multi-Modal Scene Understanding Book	Abbreviated Journal
	Volume		Issue		Pages	279-306
	Keywords	self-supervised learning; webly supervised learning; text embeddings; multimodal retrieval; multimodal embedding
	Abstract	Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated text without supervision and analyze the semantic structure of the learnt joint image and text embeddingspace. Weperformathoroughanalysisandperformancecomparisonofﬁvedifferentstateof the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text basedimageretrievaltask,andweclearlyoutperformstateoftheartintheMIRFlickrdatasetwhen training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.129; 601.338; 601.310			Approved	no
	Call Number	Admin @ si @ GGG2019			Serial	3266
Permanent link to this record



	Author	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
	Title	Exploring Hate Speech Detection in Multimodal Publications			Type	Conference Article
	Year	2020	Publication	IEEE Winter Conference on Applications of Computer Vision	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
	Address	Aspen; March 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WACV
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ GGG2020a			Serial	3280
Permanent link to this record



	Author	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas
	Title	Location Sensitive Image Retrieval and Tagging			Type	Conference Article
	Year	2020	Publication	16th European Conference on Computer Vision	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
	Address	Virtual; August 2020
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ECCV
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ GGG2020b			Serial	3420
Permanent link to this record



	Author	Suman Ghosh; Lluis Gomez; Dimosthenis Karatzas; Ernest Valveny
	Title	Efficient indexing for Query By String text retrieval			Type	Conference Article
	Year	2015	Publication	6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015	Abbreviated Journal
	Volume		Issue		Pages	1236 - 1240
	Keywords
	Abstract	This paper deals with Query By String word spotting in scene images. A hierarchical text segmentation algorithm based on text specific selective search is used to find text regions. These regions are indexed per character n-grams present in the text region. An attribute representation based on Pyramidal Histogram of Characters (PHOC) is used to compare text regions with the query text. For generation of the index a similar attribute space based Pyramidal Histogram of character n-grams is used. These attribute models are learned using linear SVMs over the Fisher Vector [1] representation of the images along with the PHOC labels of the corresponding strings.
	Address	Nancy; France; August 2015
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CBDAR
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ GGK2015			Serial	2693
Permanent link to this record



	Author	Suman Ghosh
	Title	Word Spotting and Recognition in Images from Heterogeneous Sources A			Type	Book Whole
	Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
	Address	November 2018
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Ernest Valveny
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-0-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Gho2018			Serial	3217
Permanent link to this record



	Author	Suman Ghosh; Ernest Valveny
	Title	Query by String word spotting based on character bi-gram indexing			Type	Conference Article
	Year	2015	Publication	13th International Conference on Document Analysis and Recognition ICDAR2015	Abbreviated Journal
	Volume		Issue		Pages	881-885
	Keywords
	Abstract	In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasets
	Address	Nancy; France; August 2015
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ GhV2015a			Serial	2715
Permanent link to this record



	Author	Suman Ghosh; Ernest Valveny
	Title	A Sliding Window Framework for Word Spotting Based on Word Attributes			Type	Conference Article
	Year	2015	Publication	Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015	Abbreviated Journal
	Volume	9117	Issue		Pages	652-661
	Keywords	Word spotting; Sliding window; Word attributes
	Abstract	In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
	Address	Santiago de Compostela; June 2015
	Corporate Author				Thesis
	Publisher	Springer International Publishing	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-319-19389-2	Medium
	Area		Expedition		Conference	IbPRIA
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ GhV2015b			Serial	2716
Permanent link to this record

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: