Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	256–270 of 1425 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

List View

Citations

Details

	Records
	Author	Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
	Title	ICDAR 2019 Competition on Scene Text Visual Question Answering			Type	Conference Article
	Year	2019	Publication	3rd Workshop on Closing the Loop Between Vision and Language, in conjunction with ICCV2019	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23, 038 images annotated with 31, 791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CLVL
	Notes	DAG; 600.129; 601.338; 600.135; 600.121			Approved	no
	Call Number	Admin @ si @ BTM2019a			Serial	3284
Permanent link to this record



	Author	Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
	Title	Scene Text Visual Question Answering			Type	Conference Article
	Year	2019	Publication	18th IEEE International Conference on Computer Vision	Abbreviated Journal
	Volume		Issue		Pages	4291-4301
	Keywords
	Abstract	Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.
	Address	Seul; Corea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCV
	Notes	DAG; 600.129; 600.135; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ BTM2019b			Serial	3285
Permanent link to this record



	Author	Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
	Title	ICDAR 2019 Competition on Scene Text Visual Question Answering			Type	Conference Article
	Year	2019	Publication	15th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1563-1570
	Keywords
	Abstract	This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23,038 images annotated with 31,791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding.
	Address	Sydney; Australia; September 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.129; 601.338; 600.121			Approved	no
	Call Number	Admin @ si @ BTM2019c			Serial	3286
Permanent link to this record



	Author	Claudio Baecchi; Francesco Turchini; Lorenzo Seidenari; Andrew Bagdanov; Alberto del Bimbo
	Title	Fisher vectors over random density forest for object recognition			Type	Conference Article
	Year	2014	Publication	22nd International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	4328-4333
	Keywords
	Abstract
	Address	Stockholm; Sweden; August 2014
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPR
	Notes	LAMP; 600.079			Approved	no
	Call Number	Admin @ si @ BTS2014			Serial	2518
Permanent link to this record



	Author	Chris Bahnsen; David Vazquez; Antonio Lopez; Thomas B. Moeslund
	Title	Learning to Remove Rain in Traffic Surveillance by Using Synthetic Data			Type	Conference Article
	Year	2019	Publication	14th International Conference on Computer Vision Theory and Applications	Abbreviated Journal
	Volume		Issue		Pages	123-130
	Keywords	Rain Removal; Traffic Surveillance; Image Denoising
	Abstract	Rainfall is a problem in automated traffic surveillance. Rain streaks occlude the road users and degrade the overall visibility which in turn decrease object detection performance. One way of alleviating this is by artificially removing the rain from the images. This requires knowledge of corresponding rainy and rain-free images. Such images are often produced by overlaying synthetic rain on top of rain-free images. However, this method fails to incorporate the fact that rain fall in the entire three-dimensional volume of the scene. To overcome this, we introduce training data from the SYNTHIA virtual world that models rain streaks in the entirety of a scene. We train a conditional Generative Adversarial Network for rain removal and apply it on traffic surveillance images from SYNTHIA and the AAU RainSnow datasets. To measure the applicability of the rain-removed images in a traffic surveillance context, we run the YOLOv2 object detection algorithm on the original and rain-removed frames. The results on SYNTHIA show an 8% increase in detection accuracy compared to the original rain image. Interestingly, we find that high PSNR or SSIM scores do not imply good object detection performance.
	Address	Praga; Czech Republic; February 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	VISIGRAPP
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ BVL2019			Serial	3256
Permanent link to this record



	Author	Jorge Bernal; Fernando Vilariño; F. Javier Sanchez; M. Arnold; Anarta Ghosh; Gerard Lacey
	Title	Experts vs Novices: Applying Eye-tracking Methodologies in Colonoscopy Video Screening for Polyp Search			Type	Conference Article
	Year	2014	Publication	2014 Symposium on Eye Tracking Research and Applications	Abbreviated Journal
	Volume		Issue		Pages	223-226
	Keywords
	Abstract	We present in this paper a novel study aiming at identifying the differences in visual search patterns between physicians of diverse levels of expertise during the screening of colonoscopy videos. Physicians were clustered into two groups -experts and novices- according to the number of procedures performed, and fixations were captured by an eye-tracker device during the task of polyp search in different video sequences. These fixations were integrated into heat maps, one for each cluster. The obtained maps were validated over a ground truth consisting of a mask of the polyp, and the comparison between experts and novices was performed by using metrics such as reaction time, dwelling time and energy concentration ratio. Experimental results show a statistically significant difference between experts and novices, and the obtained maps show to be a useful tool for the characterisation of the behaviour of each group.
	Address	USA; March 2014
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-4503-2751-0	Medium
	Area		Expedition		Conference	ETRA
	Notes	MV; 600.047; 600.060;SIAI			Approved	no
	Call Number	Admin @ si @ BVS2014			Serial	2448
Permanent link to this record



	Author	Marco Buzzelli; Joost Van de Weijer; Raimondo Schettini
	Title	Learning Illuminant Estimation from Object Recognition			Type	Conference Article
	Year	2018	Publication	25th International Conference on Image Processing	Abbreviated Journal
	Volume		Issue		Pages	3234 - 3238
	Keywords	Illuminant estimation; computational color constancy; semi-supervised learning; deep learning; convolutional neural networks
	Abstract	In this paper we present a deep learning method to estimate the illuminant of an image. Our model is not trained with illuminant annotations, but with the objective of improving performance on an auxiliary task such as object recognition. To the best of our knowledge, this is the first example of a deep learning architecture for illuminant estimation that is trained without ground truth illuminants. We evaluate our solution on standard datasets for color constancy, and compare it with state of the art methods. Our proposal is shown to outperform most deep learning methods in a cross-dataset evaluation setup, and to present competitive results in a comparison with parametric solutions.
	Address	Athens; Greece; October 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICIP
	Notes	LAMP; 600.109; 600.120			Approved	no
	Call Number	Admin @ si @ BWS2018			Serial	3157
Permanent link to this record



	Author	Ozan Caglayan; Walid Aransa; Adrien Bardet; Mercedes Garcia-Martinez; Fethi Bougares; Loic Barrault; Marc Masana; Luis Herranz; Joost Van de Weijer
	Title	LIUM-CVC Submissions for WMT17 Multimodal Translation Task			Type	Conference Article
	Year	2017	Publication	2nd Conference on Machine Translation	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WMT
	Notes	LAMP; 600.106; 600.120			Approved	no
	Call Number	Admin @ si @ CAB2017			Serial	3035
Permanent link to this record



	Author	M. Cruz; Cristhian A. Aguilera-Carrasco; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa
	Title	Cross-spectral image registration and fusion: an evaluation study			Type	Conference Article
	Year	2015	Publication	2nd International Conference on Machine Vision and Machine Learning	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	multispectral imaging; image registration; data fusion; infrared and visible spectra
	Abstract	This paper presents a preliminary study on the registration and fusion of cross-spectral imaging. The objective is to evaluate the validity of widely used computer vision approaches when they are applied at different spectral bands. In particular, we are interested in merging images from the infrared (both long wave infrared: LWIR and near infrared: NIR) and visible spectrum (VS). Experimental results with different data sets are presented.
	Address	Barcelona; July 2015
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	MVML
	Notes	ADAS; 600.076			Approved	no
	Call Number	Admin @ si @ CAV2015			Serial	2629
Permanent link to this record



	Author	Ozan Caglayan; Walid Aransa; Yaxing Wang; Marc Masana; Mercedes Garcıa-Martinez; Fethi Bougares; Loic Barrault; Joost Van de Weijer
	Title	Does Multimodality Help Human and Machine for Translation and Image Captioning?			Type	Conference Article
	Year	2016	Publication	1st conference on machine translation	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
	Address	Berlin; Germany; August 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WMT
	Notes	LAMP; 600.106 ; 600.068			Approved	no
	Call Number	Admin @ si @ CAW2016			Serial	2761
Permanent link to this record



	Author	Ozan Caglayan; Adrien Bardet; Fethi Bougares; Loic Barrault; Kai Wang; Marc Masana; Luis Herranz; Joost Van de Weijer
	Title	LIUM-CVC Submissions for WMT18 Multimodal Translation Task			Type	Conference Article
	Year	2018	Publication	3rd Conference on Machine Translation	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previou multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.
	Address	Brussels; Belgium; October 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WMT
	Notes	LAMP; 600.106; 600.120			Approved	no
	Call Number	Admin @ si @ CBB2018			Serial	3240
Permanent link to this record



	Author	Francesco Ciompi; Simone Balocco; Carles Caus; J. Mauri; Petia Radeva
	Title	Stent shape estimation through a comprehensive interpretation of intravascular ultrasound images			Type	Conference Article
	Year	2013	Publication	16th International Conference on Medical Image Computing and Computer Assisted Intervention	Abbreviated Journal
	Volume	8150	Issue	2	Pages	345-352
	Keywords
	Abstract	We present a method for automatic struts detection and stent shape estimation in cross-sectional intravascular ultrasound images. A stent shape is first estimated through a comprehensive interpretation of the vessel morphology, performed using a supervised context-aware multi-class classification scheme. Then, the successive strut identification exploits both local appearance and the defined stent shape. The method is tested on 589 images obtained from 80 patients, achieving a F-measure of 74.1% and an averaged distance between manual and automatic struts of 0.10 mm.
	Address	Nagoya; Japan; September 2013
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-642-40762-8	Medium
	Area		Expedition		Conference	MICCAI
	Notes	MILAB			Approved	no
	Call Number	Admin @ si @ CBC2013			Serial	2258
Permanent link to this record



	Author	Dustin Carrion Ojeda; Hong Chen; Adrian El Baz; Sergio Escalera; Chaoyu Guan; Isabelle Guyon; Ihsan Ullah; Xin Wang; Wenwu Zhu
	Title	NeurIPS’22 Cross-Domain MetaDL competition: Design and baseline results			Type	Conference Article
	Year	2022	Publication	Understanding Social Behavior in Dyadic and Small Group Interactions	Abbreviated Journal
	Volume	191	Issue		Pages	24-37
	Keywords
	Abstract	We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS'22, focusing on “cross-domain” meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve “any-way” and “any-shot” problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of “ways” (within the range 2-20) and any number of “shots” (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	PMLR
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ CCB2022			Serial	3802
Permanent link to this record



	Author	Alvaro Cepero; Albert Clapes; Sergio Escalera
	Title	Quantitative analysis of non-verbal communication for competence analysis			Type	Conference Article
	Year	2013	Publication	16th Catalan Conference on Artificial Intelligence	Abbreviated Journal
	Volume	256	Issue		Pages	105-114
	Keywords
	Abstract
	Address	Vic; October 2013
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CCIA
	Notes	HUPBA;MILAB			Approved	no
	Call Number	Admin @ si @ CCE2013			Serial	2324
Permanent link to this record



	Author	David Curto; Albert Clapes; Javier Selva; Sorina Smeureanu; Julio C. S. Jacques Junior; David Gallardo-Pujol; Georgina Guilera; David Leiva; Thomas B. Moeslund; Sergio Escalera; Cristina Palmero
	Title	Dyadformer: A Multi-Modal Transformer for Long-Range Modeling of Dyadic Interactions			Type	Conference Article
	Year	2021	Publication	IEEE/CVF International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	2177-2188
	Keywords
	Abstract	Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset.
	Address	Virtual; October 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ CCS2021			Serial	3648
Permanent link to this record