Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–20] |
Records | |||||
---|---|---|---|---|---|
Author | Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1391-1400 | ||
Keywords | Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning | ||||
Abstract | The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin. | ||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105; | Approved | no | ||
Call Number | Admin @ si @ BMG2022 | Serial | 3663 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1381-1390 | ||
Keywords | Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data | ||||
Abstract | Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online. |
||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105 | Approved | no | ||
Call Number | Admin @ si @ BGK2022 | Serial | 3662 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | Good News, Everyone! Context driven entity-aware captioning for news images | Type | Conference Article | ||
Year | 2019 | Publication | 32nd IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 12458-12467 | ||
Keywords | |||||
Abstract | Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the world. In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. For this we focus on the captioning of images used to illustrate news articles. We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image. Our model is able to selectively draw information from the article guided by visual cues, and to dynamically extend the output dictionary to out-of-vocabulary named entities that appear in the context source. Furthermore we introduce“ GoodNews”, the largest news image captioning dataset in the literature and demonstrate state-of-the-art results. | ||||
Address | Long beach; California; USA; june 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | DAG; 600.129; 600.135; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BGR2019 | Serial | 3289 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | Scene Text Visual Question Answering | Type | Conference Article | ||
Year | 2019 | Publication | 18th IEEE International Conference on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 4291-4301 | ||
Keywords | |||||
Abstract | Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research. | ||||
Address | Seul; Corea; October 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCV | ||
Notes | DAG; 600.129; 600.135; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BTM2019b | Serial | 3285 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | ICDAR 2019 Competition on Scene Text Visual Question Answering | Type | Conference Article | ||
Year | 2019 | Publication | 3rd Workshop on Closing the Loop Between Vision and Language, in conjunction with ICCV2019 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed
by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23, 038 images annotated with 31, 791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding. |
||||
Address | Sydney; Australia; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CLVL | ||
Notes | DAG; 600.129; 601.338; 600.135; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BTM2019a | Serial | 3284 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | ICDAR 2019 Competition on Scene Text Visual Question Answering | Type | Conference Article | ||
Year | 2019 | Publication | 15th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1563-1570 | ||
Keywords | |||||
Abstract | This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23,038 images annotated with 31,791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding. | ||||
Address | Sydney; Australia; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.129; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BTM2019c | Serial | 3286 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | OCR-IDL: OCR Annotations for Industry Document Library Dataset | Type | Conference Article | ||
Year | 2022 | Publication | ECCV Workshop on Text in Everything | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; no proj | Approved | no | ||
Call Number | Admin @ si @ BTG2022 | Serial | 3817 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Anjan Dutta; Albert Gordo; Josep Llados | ||||
Title | The ICDAR 2011 Music Scores Competition: Staff Removal and Writer Identification | Type | Conference Article | ||
Year | 2011 | Publication | 11th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1511-1515 | ||
Keywords | |||||
Abstract | In the last years, there has been a growing interest in the analysis of handwritten music scores. In this sense, our goal has been to foster the interest in the analysis of handwritten music scores by the proposal of two different competitions: Staff removal and Writer Identification. Both competitions have been tested on the CVC-MUSCIMA database: a ground-truth of handwritten music score images. This paper describes the competition details, including the dataset and ground-truth, the evaluation metrics, and a short description of the participants, their methods, and the obtained results. | ||||
Address | Beijing, China | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-0-7695-4520-2 | Medium | ||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ FDG2011b | Serial | 1794 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Asma Bensalah; Cristina Carmona_Duarte; Jialuo Chen; Miguel A. Ferrer; Andreas Fischer; Josep Llados; Cristina Martin; Eloy Opisso; Rejean Plamondon; Anna Scius-Bertrand; Josep Maria Tormos | ||||
Title | The RPM3D Project: 3D Kinematics for Remote Patient Monitoring | Type | Conference Article | ||
Year | 2022 | Publication | Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022 | Abbreviated Journal | |
Volume | 13424 | Issue | Pages | 217-226 | |
Keywords | Healthcare applications; Kinematic; Theory of Rapid Human Movements; Human activity recognition; Stroke rehabilitation; 3D kinematics | ||||
Abstract | This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute (https://www.guttmann.com/en/) (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases. | ||||
Address | June 7-9, 2022, Las Palmas de Gran Canaria, Spain | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | IGS | ||
Notes | DAG; 600.121; 600.162; 602.230; 600.140 | Approved | no | ||
Call Number | Admin @ si @ FBC2022 | Serial | 3739 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Beata Megyesi; Joan Mas | ||||
Title | Transcription of Encoded Manuscripts with Image Processing Techniques | Type | Conference Article | ||
Year | 2017 | Publication | Digital Humanities Conference | Abbreviated Journal | |
Volume | Issue | Pages | 441-443 | ||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | DH | ||
Notes | DAG; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ FMM2017 | Serial | 3061 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados | ||||
Title | A Symbol-dependent Writer Identifcation Approach in Old Handwritten Music Scores | Type | Conference Article | ||
Year | 2010 | Publication | 12th International Conference on Frontiers in Handwriting Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 634 - 639 | ||
Keywords | |||||
Abstract | Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we introduce a symbol-dependent approach for identifying the writer of old music scores, which is based on two symbol recognition methods. The main idea is to use the Blurred Shape Model descriptor and a DTW-based method for detecting, recognizing and describing the music clefs and notes. The proposed approach has been evaluated in a database of old music scores, achieving very high writer identification rates. | ||||
Address | Kolkata (India) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-4244-8353-2 | Medium | ||
Area | Expedition | Conference | ICFHR | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ FoL2010 | Serial | 1321 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados; Gemma Sanchez | ||||
Title | Old Handwritten Musical Symbol Classification by a Dynamic Time Warping Based Method | Type | Conference Article | ||
Year | 2007 | Publication | Seventh IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 26–27 | ||
Keywords | |||||
Abstract | |||||
Address | Curitiba (Brazil) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ FLS2007 | Serial | 887 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados; Gemma Sanchez; Horst Bunke | ||||
Title | Writer Identification in Old Handwritten Music Scores | Type | Conference Article | ||
Year | 2008 | Publication | Proceedings of the 8th International Workshop on Document Analysis Systems, | Abbreviated Journal | |
Volume | Issue | Pages | 347–353 | ||
Keywords | |||||
Abstract | |||||
Address | Nara (Japan) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | DAS | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ FLS2008b | Serial | 1078 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados; Gemma Sanchez; Horst Bunke | ||||
Title | Symbol-independent writer identification in old handwritten music scores | Type | Conference Article | ||
Year | 2009 | Publication | In proceedings of 8th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 186–197 | ||
Keywords | |||||
Abstract | |||||
Address | La Rochelle, France | ||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-13727-3 | Medium | |
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ FLS2009a | Serial | 1222 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados; Gemma Sanchez; Horst Bunke | ||||
Title | On the use of textural features for writer identification in old handwritten music scores | Type | Conference Article | ||
Year | 2009 | Publication | 10th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 996 - 1000 | ||
Keywords | |||||
Abstract | Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates. | ||||
Address | Barcelona | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1520-5363 | ISBN | 978-1-4244-4500-4 | Medium | |
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ FLS2009b | Serial | 1223 | ||
Permanent link to this record |