|
Records |
Links |
|
Author |
Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas |

|
|
Title |
Location Sensitive Image Retrieval and Tagging |
Type |
Conference Article |
|
Year |
2020 |
Publication |
16th European Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging. |
|
|
Address  |
Virtual; August 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCV |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GGG2020b |
Serial |
3420 |
|
Permanent link to this record |
|
|
|
|
Author |
Lei Kang; Pau Riba; Yaxing Wang; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas |

|
|
Title |
GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images |
Type |
Conference Article |
|
Year |
2020 |
Publication |
16th European Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Although current image generation methods have reached impressive quality levels, they are still unable to produce plausible yet diverse images of handwritten words. On the contrary, when writing by hand, a great variability is observed across different writers, and even when analyzing words scribbled by the same individual, involuntary variations are conspicuous. In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words. We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content. Our generator is guided by three complementary learning objectives: to produce realistic images, to imitate a certain handwriting style and to convey a specific textual content. Our model is unconstrained to any predefined vocabulary, being able to render whatever input word. Given a sample writer, it is also able to mimic its calligraphic features in a few-shot setup. We significantly advance over prior art and demonstrate with qualitative, quantitative and human-based evaluations the realistic aspect of our synthetically produced images. |
|
|
Address  |
Virtual; August 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCV |
|
|
Notes |
DAG; 600.140; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KPW2020 |
Serial |
3426 |
|
Permanent link to this record |
|
|
|
|
Author |
Kai Wang; Luis Herranz; Anjan Dutta; Joost Van de Weijer |

|
|
Title |
Bookworm continual learning: beyond zero-shot learning and continual learning |
Type |
Conference Article |
|
Year |
2020 |
Publication |
Workshop TASK-CV 2020 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
We propose bookworm continual learning(BCL), a flexible setting where unseen classes can be inferred via a semantic model, and the visual model can be updated continually. Thus BCL generalizes both continual learning (CL) and zero-shot learning (ZSL). We also propose the bidirectional imagination (BImag) framework to address BCL where features of both past and future classes are generated. We observe that conditioning the feature generator on attributes can actually harm the continual learning ability, and propose two variants (joint class-attribute conditioning and asymmetric generation) to alleviate this problem. |
|
|
Address  |
Virtual; August 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCVW |
|
|
Notes |
LAMP; 600.141; 600.120;DAG;CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ WHD2020 |
Serial |
3466 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Y.Kessentini; Alicia Fornes |

|
|
Title |
A conditional GAN based approach for distorted camera captured documents recovery |
Type |
Conference Article |
|
Year |
2020 |
Publication |
4th Mediterranean Conference on Pattern Recognition and Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address  |
Virtual; December 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
MedPRAI |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SKF2020 |
Serial |
3450 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Alicia Fornes; Y.Kessentini; C.Tudor |


|
|
Title |
A Few-shot Learning Approach for Historical Encoded Manuscript Recognition |
Type |
Conference Article |
|
Year |
2021 |
Publication |
25th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
5413-5420 |
|
|
Keywords |
|
|
|
Abstract |
Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition. |
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SFK2021 |
Serial |
3449 |
|
Permanent link to this record |
|
|
|
|
Author |
Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |


|
|
Title |
Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval |
Type |
Conference Article |
|
Year |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
4022-4032 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MDB2021 |
Serial |
3491 |
|
Permanent link to this record |
|
|
|
|
Author |
Andres Mafla; Rafael S. Rezende; Lluis Gomez; Diana Larlus; Dimosthenis Karatzas |


|
|
Title |
StacMR: Scene-Text Aware Cross-Modal Retrieval |
Type |
Conference Article |
|
Year |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2219-2229 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MRG2021a |
Serial |
3492 |
|
Permanent link to this record |
|
|
|
|
Author |
Minesh Mathew; Dimosthenis Karatzas; C.V. Jawahar |

|
|
Title |
DocVQA: A Dataset for VQA on Document Images |
Type |
Conference Article |
|
Year |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2200-2209 |
|
|
Keywords |
|
|
|
Abstract |
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa. org |
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MKJ2021 |
Serial |
3498 |
|
Permanent link to this record |
|
|
|
|
Author |
Asma Bensalah; Jialuo Chen; Alicia Fornes; Cristina Carmona_Duarte; Josep Llados; Miguel A. Ferrer |


|
|
Title |
Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. |
Type |
Conference Article |
|
Year |
2020 |
Publication |
International Workshop on Artificial Intelligence for Healthcare Applications |
Abbreviated Journal |
|
|
|
Volume |
12661 |
Issue |
|
Pages |
476-489 |
|
|
Keywords |
|
|
|
Abstract |
Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field. |
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPRW |
|
|
Notes |
DAG; 600.121; 600.140; |
Approved |
no |
|
|
Call Number |
Admin @ si @ BCF2020 |
Serial |
3508 |
|
Permanent link to this record |
|
|
|
|
Author |
Manuel Carbonell; Pau Riba; Mauricio Villegas; Alicia Fornes; Josep Llados |

|
|
Title |
Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents |
Type |
Conference Article |
|
Year |
2020 |
Publication |
25th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
The use of administrative documents to communicate and leave record of business information requires of methods
able to automatically extract and understand the content from
such documents in a robust and efficient way. In addition,
the semi-structured nature of these reports is specially suited
for the use of graph-based representations which are flexible
enough to adapt to the deformations from the different document
templates. Moreover, Graph Neural Networks provide the proper
methodology to learn relations among the data elements in
these documents. In this work we study the use of Graph
Neural Network architectures to tackle the problem of entity
recognition and relation extraction in semi-structured documents.
Our approach achieves state of the art results in the three
tasks involved in the process. Additionally, the experimentation
with two datasets of different nature demonstrates the good
generalization ability of our approach. |
|
|
Address  |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ CRV2020 |
Serial |
3509 |
|
Permanent link to this record |