|
Records |
Links |
|
Author |
Raul Gomez; Yahui Liu; Marco de Nadai; Dimosthenis Karatzas; Bruno Lepri; Nicu Sebe |
|
|
Title |
Retrieval Guided Unsupervised Multi-domain Image to Image Translation |
Type |
Conference Article |
|
Year |
2020 |
Publication |
28th ACM International Conference on Multimedia |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Image to image translation aims to learn a mapping that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style representation. Thus, translation models seek to preserve the content of source images while changing the style to a target visual domain. However, synthesizing new images is extremely challenging especially in multi-domain translations, as the network has to compose content and style to generate reliable and diverse images in multiple domains. In this paper we propose the use of an image retrieval system to assist the image-to-image translation task. First, we train an image-to-image translation model to map images to multiple domains. Then, we train an image retrieval model using real and generated images to find images similar to a query one in content but in a different domain. Finally, we exploit the image retrieval system to fine-tune the image-to-image translation model and generate higher quality images. Our experiments show the effectiveness of the proposed solution and highlight the contribution of the retrieval network, which can benefit from additional unlabeled data and help image-to-image translation models in the presence of scarce data. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ACM |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GLN2020 |
Serial |
3497 |
|
Permanent link to this record |
|
|
|
|
Author |
Sounak Dey; Anjan Dutta; Suman Ghosh; Ernest Valveny; Josep Llados |
|
|
Title |
Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework |
Type |
Conference Article |
|
Year |
2018 |
Publication |
14th Asian Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset. |
|
|
Address |
Perth; Australia; December 2018 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ACCV |
|
|
Notes |
DAG; 600.097; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ DDG2018a |
Serial |
3151 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
|
|
|
Keywords |
Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning |
|
|
Abstract |
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBM2023 |
Serial |
3848 |
|
Permanent link to this record |
|
|
|
|
Author |
Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
1940-1948 |
|
|
Keywords |
|
|
|
Abstract |
Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. |
|
|
Address |
Washington; USA; February 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ NBM2023 |
Serial |
3860 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; J. Lopez-Krahe; Enric Marti |
|
|
Title |
A Hough-based method for hatched pattern detection in maps and diagrams. |
Type |
Miscellaneous |
|
Year |
1999 |
Publication |
Proceedings of the International Conference on Document Analysis and Recognition. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Bangalore-India |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LlM1999b |
Serial |
1 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Felipe Lumbreras; X. Varona |
|
|
Title |
A multidocument platform for automatic reading of identity cards. |
Type |
Miscellaneous |
|
Year |
1999 |
Publication |
Proceedings of the VIII Symposium Nacional de Reconocimiento de Formas y Analisis de Imagenes. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Bilbao |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS;DAG |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ LLV1999 |
Serial |
7 |
|
Permanent link to this record |
|
|
|
|
Author |
A. Pujol; Jordi Vitria; Petia Radeva; X. Binefa; Robert Benavente; Ernest Valveny; Craig Von Land |
|
|
Title |
Real time pharmaceutical product recognition using color and shape indexing. |
Type |
Conference Article |
|
Year |
1999 |
Publication |
Proceedings of the 2nd International Workshop on European Scientific and Industrial Collaboration (WESIC´99), Promotoring Advanced Technologies in Manufacturing. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Wales |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
OR;MILAB;DAG;CIC;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ PVR1999 |
Serial |
24 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Gemma Sanchez; Enric Marti |
|
|
Title |
A String-Based Method to Recognize Symbols and Structural Textures in Architectural Plans. |
Type |
Miscellaneous |
|
Year |
1997 |
Publication |
Second IAPR Workshop on Graphics Recognition, pp. 287–294. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LSM1997 |
Serial |
44 |
|
Permanent link to this record |
|
|
|
|
Author |
Jordi Vitria; Petia Radeva; X. Binefa; A. Pujol; Ernest Valveny; Robert Benavente; Craig Von Land |
|
|
Title |
Real time recognition of pharmaceutical products by subspace methods |
Type |
Report |
|
Year |
1999 |
Publication |
CVC Technical Report #35 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
CVC (UAB) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
OR;MILAB;DAG;CIC;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ VRB1999b |
Serial |
54 |
|
Permanent link to this record |
|
|
|
|
Author |
V. Chapaprieta; Ernest Valveny |
|
|
Title |
Handwritten Digit Recognition Using Point Distribution Models. |
Type |
Miscellaneous |
|
Year |
2001 |
Publication |
Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis, 1:49–54. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ ChV2001 |
Serial |
83 |
|
Permanent link to this record |