Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

Details

Records
Author	Alejandro Cartas; Mariella Dimiccoli; Petia Radeva
Title	Batch-based activity recognition from egocentric photo-streams			Type	Conference Article
Year	2017	Publication	1st International workshop on Egocentric Perception, Interaction and Computing	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Activity recognition from long unstructured egocentric photo-streams has several applications in assistive technology such as health monitoring and frailty detection, just to name a few. However, one of its main technical challenges is to deal with the low frame rate of wearable photo-cameras, which causes abrupt appearance changes between consecutive frames. In consequence, important discriminatory low-level features from motion such as optical flow cannot be estimated. In this paper, we present a batch-driven approach for training a deep learning architecture that strongly rely on Long short-term units to tackle this problem. We propose two different implementations of the same approach that process a photo-stream sequence using batches of fixed size with the goal of capturing the temporal evolution of high-level features. The main difference between these implementations is that one explicitly models consecutive batches by overlapping them. Experimental results over a public dataset acquired by three users demonstrate the validity of the proposed architectures to exploit the temporal evolution of convolutional features over time without relying on event boundaries.
Address	Venice; Italy; October 2017;
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCV - EPIC
Notes	MILAB; no menciona			Approved	no
Call Number	Admin @ si @ CDR2017			Serial	3023
Permanent link to this record



Author	Ishaan Gulrajani; Kundan Kumar; Faruk Ahmed; Adrien Ali Taiga; Francesco Visin; David Vazquez; Aaron Courville
Title	PixelVAE: A Latent Variable Model for Natural Images			Type	Conference Article
Year	2017	Publication	5th International Conference on Learning Representations	Abbreviated Journal
Volume		Issue		Pages
Keywords	Deep Learning; Unsupervised Learning
Abstract	Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but tend to suffer from image blurriness. PixelCNNs model sharp contours and details very well, but lack an explicit latent representation and have difficulty modeling large-scale structure in a computationally efficient way. In this paper, we present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. The resulting architecture achieves state-of-the-art log-likelihood on binarized MNIST. We extend PixelVAE to a hierarchy of multiple latent variables at different scales; this hierarchical model achieves competitive likelihood on 64x64 ImageNet and generates high-quality samples on LSUN bedrooms.
Address	Toulon; France; April 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICLR
Notes	ADAS; 600.085; 600.076; 601.281; 600.118			Approved	no
Call Number	ADAS @ adas @ GKA2017			Serial	2815
Permanent link to this record



Author	Xinhang Song; Luis Herranz; Shuqiang Jiang
Title	Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs			Type	Conference Article
Year	2017	Publication	31st AAAI Conference on Artificial Intelligence	Abbreviated Journal
Volume		Issue		Pages
Keywords	RGB-D scene recognition; weakly supervised; fine tune; CNN
Abstract	Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data.
Address	San Francisco CA; February 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	AAAI
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ SHJ2017			Serial	2967
Permanent link to this record



Author	Laura Lopez-Fuentes; Joost Van de Weijer; Marc Bolaños; Harald Skinnemoen
Title	Multi-modal Deep Learning Approach for Flood Detection			Type	Conference Article
Year	2017	Publication	MediaEval Benchmarking Initiative for Multimedia Evaluation	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we propose a multi-modal deep learning approach to detect floods in social media posts. Social media posts normally contain some metadata and/or visual information, therefore in order to detect the floods we use this information. The model is based on a Convolutional Neural Network which extracts the visual features and a bidirectional Long Short-Term Memory network to extract the semantic features from the textual metadata. We validate the method on images extracted from Flickr which contain both visual information and metadata and compare the results when using both, visual information only or metadata only. This work has been done in the context of the MediaEval Multimedia Satellite Task.
Address	Dublin; Ireland; September 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	MediaEval
Notes	LAMP; 600.084; 600.109; 600.120			Approved	no
Call Number	Admin @ si @ LWB2017a			Serial	2974
Permanent link to this record



Author	Daniel Hernandez; Lukas Schneider; Antonio Espinosa; David Vazquez; Antonio Lopez; Uwe Franke; Marc Pollefeys; Juan C. Moure
Title	Slanted Stixels: Representing San Francisco's Steepest Streets			Type	Conference Article
Year	2017	Publication	28th British Machine Vision Conference	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this work we present a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced that uses an extremely efficient over-segmentation. In doing so, the computational complexity of the Stixel inference algorithm is reduced significantly, achieving real-time computation capabilities with only a slight drop in accuracy. We evaluate the proposed approach in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.
Address	London; uk; September 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	BMVC
Notes	ADAS; 600.118			Approved	no
Call Number	ADAS @ adas @ HSE2017a			Serial	2945
Permanent link to this record



Author	Carles Sanchez; Antonio Esteban Lansaque; Agnes Borras; Marta Diez-Ferrer; Antoni Rosell; Debora Gil
Title	Towards a Videobronchoscopy Localization System from Airway Centre Tracking			Type	Conference Article
Year	2017	Publication	12th International Conference on Computer Vision Theory and Applications	Abbreviated Journal
Volume		Issue		Pages	352-359
Keywords	Video-bronchoscopy; Lung cancer diagnosis; Airway lumen detection; Region tracking; Guided bronchoscopy navigation
Abstract	Bronchoscopists use fluoroscopy to guide flexible bronchoscopy to the lesion to be biopsied without any kind of incision. Being fluoroscopy an imaging technique based on X-rays, the risk of developmental problems and cancer is increased in those subjects exposed to its application, so minimizing radiation is crucial. Alternative guiding systems such as electromagnetic navigation require specific equipment, increase the cost of the clinical procedure and still require fluoroscopy. In this paper we propose an image based guiding system based on the extraction of airway centres from intra-operative videos. Such anatomical landmarks are matched to the airway centreline extracted from a pre-planned CT to indicate the best path to the nodule. We present a feasibility study of our navigation system using simulated bronchoscopic videos and a multi-expert validation of landmarks extraction in 3 intra-operative ultrathin explorations.
Address	Porto; Portugal; February 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	IAM; 600.096; 600.075; 600.145			Approved	no
Call Number	Admin @ si @ SEB2017			Serial	2943
Permanent link to this record



Author	Umut Guclu; Yagmur Gucluturk; Meysam Madadi; Sergio Escalera; Xavier Baro; Jordi Gonzalez; Rob van Lier; Marcel A. J. van Gerven
Title	End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks			Type	Miscellaneous
Year	2017	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	arXiv:1703.03305 Recent years have seen a sharp increase in the number of related yet distinct advances in semantic segmentation. Here, we tackle this problem by leveraging the respective strengths of these advances. That is, we formulate a conditional random field over a four-connected graph as end-to-end trainable convolutional and recurrent networks, and estimate them via an adversarial process. Importantly, our model learns not only unary potentials but also pairwise potentials, while aggregating multi-scale contexts and controlling higher-order inconsistencies. We evaluate our model on two standard benchmark datasets for semantic face segmentation, achieving state-of-the-art results on both of them.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; ISE; 600.098; 600.119			Approved	no
Call Number	Admin @ si @ GGM2017			Serial	2932
Permanent link to this record



Author	Zhijie Fang; David Vazquez; Antonio Lopez
Title	On-Board Detection of Pedestrian Intentions			Type	Journal Article
Year	2017	Publication	Sensors	Abbreviated Journal	SENS
Volume	17	Issue	10	Pages	2193
Keywords	pedestrian intention; ADAS; self-driving
Abstract	Avoiding vehicle-to-pedestrian crashes is a critical requirement for nowadays advanced driver assistant systems (ADAS) and future self-driving vehicles. Accordingly, detecting pedestrians from raw sensor data has a history of more than 15 years of research, with vision playing a central role. During the last years, deep learning has boosted the accuracy of image-based pedestrian detectors. However, detection is just the first step towards answering the core question, namely is the vehicle going to crash with a pedestrian provided preventive actions are not taken? Therefore, knowing as soon as possible if a detected pedestrian has the intention of crossing the road ahead of the vehicle is essential for performing safe and comfortable maneuvers that prevent a crash. However, compared to pedestrian detection, there is relatively little literature on detecting pedestrian intentions. This paper aims to contribute along this line by presenting a new vision-based approach which analyzes the pose of a pedestrian along several frames to determine if he or she is going to enter the road or not. We present experiments showing 750 ms of anticipation for pedestrians crossing the road, which at a typical urban driving speed of 50 km/h can provide 15 additional meters (compared to a pure pedestrian detector) for vehicle automatic reactions or to warn the driver. Moreover, in contrast with state-of-the-art methods, our approach is monocular, neither requiring stereo nor optical flow information.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.085; 600.076; 601.223; 600.116; 600.118			Approved	no
Call Number	Admin @ si @ FVL2017			Serial	2983
Permanent link to this record



Author	Cristhian A. Aguilera-Carrasco; Angel Sappa; Cristhian Aguilera; Ricardo Toledo
Title	Cross-Spectral Local Descriptors via Quadruplet Network			Type	Journal Article
Year	2017	Publication	Sensors	Abbreviated Journal	SENS
Volume	17	Issue	4	Pages	873
Keywords
Abstract	This paper presents a novel CNN-based architecture, referred to as Q-Net, to learn local feature descriptors that are useful for matching image patches from two different spectral bands. Given correctly matched and non-matching cross-spectral image pairs, a quadruplet network is trained to map input image patches to a common Euclidean space, regardless of the input spectral band. Our approach is inspired by the recent success of triplet networks in the visible spectrum, but adapted for cross-spectral scenarios, where, for each matching pair, there are always two possible non-matching patches: one for each spectrum. Experimental evaluations on a public cross-spectral VIS-NIR dataset shows that the proposed approach improves the state-of-the-art. Moreover, the proposed technique can also be used in mono-spectral settings, obtaining a similar performance to triplet network descriptors, but requiring less training data.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.086; 600.118			Approved	no
Call Number	Admin @ si @ ASA2017			Serial	2914
Permanent link to this record



Author	Xinhang Song; Shuqiang Jiang; Luis Herranz
Title	Combining Models from Multiple Sources for RGB-D Scene Recognition			Type	Conference Article
Year	2017	Publication	26th International Joint Conference on Artificial Intelligence	Abbreviated Journal
Volume		Issue		Pages	4523-4529
Keywords	Robotics and Vision; Vision and Perception
Abstract	Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at high-levels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.
Address	Melbourne; Australia; August 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IJCAI
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ SJH2017b			Serial	2966
Permanent link to this record



Author	Ozan Caglayan; Walid Aransa; Adrien Bardet; Mercedes Garcia-Martinez; Fethi Bougares; Loic Barrault; Marc Masana; Luis Herranz; Joost Van de Weijer
Title	LIUM-CVC Submissions for WMT17 Multimodal Translation Task			Type	Conference Article
Year	2017	Publication	2nd Conference on Machine Translation	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WMT
Notes	LAMP; 600.106; 600.120			Approved	no
Call Number	Admin @ si @ CAB2017			Serial	3035
Permanent link to this record



Author	C. Alejandro Parraga
Title	Colours and Colour Vision: An Introductory Survey			Type	Journal Article
Year	2017	Publication	Perception	Abbreviated Journal	PER
Volume	46	Issue	5	Pages	640-641
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	NEUROBIT; no menciona			Approved	no
Call Number	Par2017			Serial	3101
Permanent link to this record



Author	Arash Akbarinia; Karl R. Gegenfurtner
Title	Metameric Mismatching in Natural and Artificial Reflectances			Type	Journal Article
Year	2017	Publication	Journal of Vision	Abbreviated Journal	JV
Volume	17	Issue	10	Pages	390-390
Keywords	Metamer; colour perception; spectral discrimination; photoreceptors
Abstract	The human visual system and most digital cameras sample the continuous spectral power distribution through three classes of receptors. This implies that two distinct spectral reflectances can result in identical tristimulus values under one illuminant and differ under another – the problem of metamer mismatching. It is still debated how frequent this issue arises in the real world, using naturally occurring reflectance functions and common illuminants. We gathered more than ten thousand spectral reflectance samples from various sources, covering a wide range of environments (e.g., flowers, plants, Munsell chips) and evaluated their responses under a number of natural and artificial source of lights. For each pair of reflectance functions, we estimated the perceived difference using the CIE-defined distance ΔE2000 metric in Lab color space. The degree of metamer mismatching depended on the lower threshold value l when two samples would be considered to lead to equal sensor excitations (ΔE < l), and on the higher threshold value h when they would be considered different. For example, for l=h=1, we found that 43.129 comparisons out of a total of 6×107 pairs would be considered metameric (1 in 104). For l=1 and h=5, this number reduced to 705 metameric pairs (2 in 106). Extreme metamers, for instance l=1 and h=10, were rare (22 pairs or 6 in 108), as were instances where the two members of a metameric pair would be assigned to different color categories. Not unexpectedly, we observed variations among different reflectance databases and illuminant spectra with more frequency under artificial illuminants than natural ones. Overall, our numbers are not very different from those obtained earlier (Foster et al, JOSA A, 2006). However, our results also show that the degree of metamerism is typically not very strong and that category switches hardly ever occur.
Address	Florida, USA; May 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	NEUROBIT; no menciona			Approved	no
Call Number	Admin @ si @ AkG2017			Serial	2899
Permanent link to this record



Author	Aniol Lidon; Marc Bolaños; Mariella Dimiccoli; Petia Radeva; Maite Garolera; Xavier Giro
Title	Semantic Summarization of Egocentric Photo-Stream Events			Type	Conference Article
Year	2017	Publication	2nd Workshop on Lifelogging Tools and Applications	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	San Francisco; USA; October 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-4503-5503-2	Medium
Area		Expedition		Conference	ACMW (LTA)
Notes	MILAB; no proj			Approved	no
Call Number	Admin @ si @ LBD2017			Serial	3024
Permanent link to this record



Author	Pierdomenico Fiadino; Victor Ponce; Juan Antonio Torrero-Gonzalez; Marc Torrent-Moreno
Title	Call Detail Records for Human Mobility Studies: Taking Stock of the Situation in the “Always Connected Era"			Type	Conference Article
Year	2017	Publication	Workshop on Big Data Analytics and Machine Learning for Data Communication Networks	Abbreviated Journal
Volume		Issue		Pages	43-48
Keywords	mobile networks; call detail records; human mobility
Abstract	The exploitation of cellular network data for studying human mobility has been a popular research topic in the last decade. Indeed, mobile terminals could be considered ubiquitous sensors that allow the observation of human movements on large scale without the need of relying on non-scalable techniques, such as surveys, or dedicated and expensive monitoring infrastructures. In particular, Call Detail Records (CDRs), collected by operators for billing purposes, have been extensively employed due to their rather large availability, compared to other types of cellular data (e.g., signaling). Despite the interest aroused around this topic, the research community has generally agreed about the scarcity of information provided by CDRs: the position of mobile terminals is logged when some kind of activity (calls, SMS, data connections) occurs, which translates in a picture of mobility somehow biased by the activity degree of users. By studying two datasets collected by a Nation-wide operator in 2014 and 2016, we show that the situation has drastically changed in terms of data volume and quality. The increase of flat data plans and the higher penetration of “ always connected” terminals have driven up the number of recorded CDRs, providing higher temporal accuracy for users’ locations.
Address	UCLA; USA; August 2017
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-4503-5054-9	Medium
Area		Expedition		Conference	ACMW (SIGCOMM)
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ FPT2017			Serial	2980
Permanent link to this record