|
Records |
Links |
|
Author |
Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar |
|
|
Title |
Self-Supervised Visual Representations for Cross-Modal Retrieval |
Type |
Conference Article |
|
Year |
2019 |
Publication |
ACM International Conference on Multimedia Retrieval |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
182–186 |
|
|
Keywords |
|
|
|
Abstract |
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset. |
|
|
Address |
Otawa; Canada; june 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMR |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ PGR2019 |
Serial |
3288 |
|
Permanent link to this record |
|
|
|
|
Author |
Miquel Ferrer; Ernest Valveny; F. Serratosa; Horst Bunke |
|
|
Title |
Exact Median Graph Computation via Graph Embedding |
Type |
Conference Article |
|
Year |
2008 |
Publication |
12th International Workshop on Structural and Syntactic Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
5324 |
Issue |
|
Pages |
15–24 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Orlando – Florida (USA) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SSPR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ FVS2008b |
Serial |
1076 |
|
Permanent link to this record |
|
|
|
|
Author |
Albert Berenguel |
|
|
Title |
Analysis of background textures in banknotes and identity documents for counterfeit detection |
Type |
Book Whole |
|
Year |
2019 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection. |
|
|
Address |
November 2019 |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Oriol Ramos Terrades;Josep Llados |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-121011-2-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.140; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Ber2019 |
Serial |
3395 |
|
Permanent link to this record |
|
|
|
|
Author |
Dena Bazazian |
|
|
Title |
Fully Convolutional Networks for Text Understanding in Scene Images |
Type |
Book Whole |
|
Year |
2018 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging. |
|
|
Address |
November 2018 |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Dimosthenis Karatzas;Andrew Bagdanov |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-948531-1-1 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Baz2018 |
Serial |
3220 |
|
Permanent link to this record |
|
|
|
|
Author |
Suman Ghosh |
|
|
Title |
Word Spotting and Recognition in Images from Heterogeneous Sources A |
Type |
Book Whole |
|
Year |
2018 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations. |
|
|
Address |
November 2018 |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Ernest Valveny |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-948531-0-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Gho2018 |
Serial |
3217 |
|
Permanent link to this record |
|
|
|
|
Author |
Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang |
|
|
Title |
Improving Handwritten Music Recognition through Language Model Integration |
Type |
Conference Article |
|
Year |
2022 |
Publication |
4th International Workshop on Reading Music Systems (WoRMS2022) |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
42-46 |
|
|
Keywords |
optical music recognition; historical sources; diversity; music theory; digital humanities |
|
|
Abstract |
Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature. |
|
|
Address |
November 18, 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WoRMS |
|
|
Notes |
DAG; 600.121; 600.162; 602.230 |
Approved |
no |
|
|
Call Number |
Admin @ si @ TBF2022 |
Serial |
3735 |
|
Permanent link to this record |
|
|
|
|
Author |
Jialuo Chen; Pau Riba; Alicia Fornes; Juan Mas; Josep Llados; Joana Maria Pujadas-Mora |
|
|
Title |
Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts |
Type |
Conference Article |
|
Year |
2018 |
Publication |
16th International Conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
528-533 |
|
|
Keywords |
Crowdsourcing; Gamification; Handwritten documents; Performance evaluation |
|
|
Abstract |
Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance. |
|
|
Address |
Niagara Falls, USA; August 2018 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICFHR |
|
|
Notes |
DAG; 600.097; 603.057; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ CRF2018 |
Serial |
3169 |
|
Permanent link to this record |
|
|
|
|
Author |
N. Zakaria; Jean-Marc Ogier; Josep Llados |
|
|
Title |
The Fuzzy-Spatial Descriptor for the Online Graphic Recognition: Overlapping Matrix Algorithm |
Type |
Book Chapter |
|
Year |
2006 |
Publication |
7th International Workshop, Document Analysis Systems VII (DAS´06), LNCS 3872: 616–627 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Nelson (New Zealand) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ ZOL2006 |
Serial |
629 |
|
Permanent link to this record |
|
|
|
|
Author |
T.O. Nguyen; Salvatore Tabbone; Oriol Ramos Terrades |
|
|
Title |
Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval |
Type |
Conference Article |
|
Year |
2008 |
Publication |
Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
191-197 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Nara, Japan |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ NTR2008a |
Serial |
1873 |
|
Permanent link to this record |
|
|
|
|
Author |
Partha Pratim Roy; Umapada Pal; Josep Llados |
|
|
Title |
Multi-oriented English Text Line Extraction using Background and Foreground Information |
Type |
Conference Article |
|
Year |
2008 |
Publication |
Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
315–322 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Nara (Japo) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RPL2008b |
Serial |
1047 |
|
Permanent link to this record |