Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

Details

Records
Author	Cesar de Souza
Title	Action Recognition in Videos: Data-efficient approaches for supervised learning of human action classification models for video			Type	Book Whole
Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this dissertation, we explore different ways to perform human action recognition in video clips. We focus on data efficiency, proposing new approaches that alleviate the need for laborious and time-consuming manual data annotation. In the first part of this dissertation, we start by analyzing previous state-of-the-art models, comparing their differences and similarities in order to pinpoint where their real strengths come from. Leveraging this information, we then proceed to boost the classification accuracy of shallow models to levels that rival deep neural networks. We introduce hybrid video classification architectures based on carefully designed unsupervised representations of handcrafted spatiotemporal features classified by supervised deep networks. We show in our experiments that our hybrid model combine the best of both worlds: it is data efficient (trained on 150 to 10,000 short clips) and yet improved significantly on the state of the art, including deep models trained on millions of manually labeled images and videos. In the second part of this research, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We then introduce deep multi-task representation learning architectures to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
Address	April 2018
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;Naila Murray
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.118			Approved	no
Call Number	Admin @ si @ Sou2018			Serial	3127
Permanent link to this record



Author	Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov
Title	Word Spotting in Scene Images based on Character Recognition			Type	Conference Article
Year	2018	Publication	IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
Volume		Issue		Pages	1872-1874
Keywords
Abstract	In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.
Address	Salt Lake City; USA; June 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	DAG; 600.129; 600.121			Approved	no
Call Number	BKB2018a			Serial	3179
Permanent link to this record



Author	Marco Buzzelli; Joost Van de Weijer; Raimondo Schettini
Title	Learning Illuminant Estimation from Object Recognition			Type	Conference Article
Year	2018	Publication	25th International Conference on Image Processing	Abbreviated Journal
Volume		Issue		Pages	3234 - 3238
Keywords	Illuminant estimation; computational color constancy; semi-supervised learning; deep learning; convolutional neural networks
Abstract	In this paper we present a deep learning method to estimate the illuminant of an image. Our model is not trained with illuminant annotations, but with the objective of improving performance on an auxiliary task such as object recognition. To the best of our knowledge, this is the first example of a deep learning architecture for illuminant estimation that is trained without ground truth illuminants. We evaluate our solution on standard datasets for color constancy, and compare it with state of the art methods. Our proposal is shown to outperform most deep learning methods in a cross-dataset evaluation setup, and to present competitive results in a comparison with parametric solutions.
Address	Athens; Greece; October 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICIP
Notes	LAMP; 600.109; 600.120			Approved	no
Call Number	Admin @ si @ BWS2018			Serial	3157
Permanent link to this record



Author	Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas
Title	Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters			Type	Conference Article
Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
Volume		Issue		Pages	97-102
Keywords	Robust Reading; End-to-end Systems; CNN; Utility Meters
Abstract	In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as dirt, illumination highlights or blur.
Address	Viena; Austria; April 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	DAS
Notes	DAG; 600.084; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ GRK2018			Serial	3102
Permanent link to this record



Author	Sounak Dey; Anjan Dutta; Suman Ghosh; Ernest Valveny; Josep Llados
Title	Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework			Type	Conference Article
Year	2018	Publication	14th Asian Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.
Address	Perth; Australia; December 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ACCV
Notes	DAG; 600.097; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ DDG2018a			Serial	3151
Permanent link to this record



Author	Xialei Liu; Marc Masana; Luis Herranz; Joost Van de Weijer; Antonio Lopez; Andrew Bagdanov
Title	Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting			Type	Conference Article
Year	2018	Publication	24th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	2262-2268
Keywords
Abstract	In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to the state-of-the-art in lifelong learning without forgetting.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	LAMP; ADAS; 601.305; 601.109; 600.124; 600.106; 602.200; 600.120; 600.118			Approved	no
Call Number	Admin @ si @ LMH2018			Serial	3160
Permanent link to this record



Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
Title	Learning to Learn from Web Data through Deep Semantic Embeddings			Type	Conference Article
Year	2018	Publication	15th European Conference on Computer Vision Workshops	Abbreviated Journal
Volume	11134	Issue		Pages	514-529
Keywords
Abstract	In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
Address	Munich; Alemanya; September 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECCVW
Notes	DAG; 600.129; 601.338; 600.121			Approved	no
Call Number	Admin @ si @ GGG2018a			Serial	3175
Permanent link to this record



Author	Laura Lopez-Fuentes; Alessandro Farasin; Harald Skinnemoen; Paolo Garza
Title	Deep Learning models for passability detection of flooded roads			Type	Conference Article
Year	2018	Publication	MediaEval 2018 Multimedia Benchmark Workshop	Abbreviated Journal
Volume	2283	Issue		Pages
Keywords
Abstract	In this paper we study and compare several approaches to detect floods and evidence for passability of roads by conventional means in Twitter. We focus on tweets containing both visual information (a picture shared by the user) and metadata, a combination of text and related extra information intrinsic to the Twitter API. This work has been done in the context of the MediaEval 2018 Multimedia Satellite Task.
Address	Sophia Antipolis; France; October 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	MediaEval
Notes	LAMP; 600.084; 600.109; 600.120			Approved	no
Call Number	Admin @ si @ LFS2018			Serial	3224
Permanent link to this record



Author	Katerine Diaz; Jesus Martinez del Rincon; Aura Hernandez-Sabate; Debora Gil
Title	Continuous head pose estimation using manifold subspace embedding and multivariate regression			Type	Journal Article
Year	2018	Publication	IEEE Access	Abbreviated Journal	ACCESS
Volume	6	Issue		Pages	18325 - 18334
Keywords	Head Pose estimation; HOG features; Generalized Discriminative Common Vectors; B-splines; Multiple linear regression
Abstract	In this paper, a continuous head pose estimation system is proposed to estimate yaw and pitch head angles from raw facial images. Our approach is based on manifold learningbased methods, due to their promising generalization properties shown for face modelling from images. The method combines histograms of oriented gradients, generalized discriminative common vectors and continuous local regression to achieve successful performance. Our proposal was tested on multiple standard face datasets, as well as in a realistic scenario. Results show a considerable performance improvement and a higher consistence of our model in comparison with other state-of-art methods, with angular errors varying between 9 and 17 degrees.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2169-3536	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.118			Approved	no
Call Number	Admin @ si @ DMH2018b			Serial	3091
Permanent link to this record



Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title	Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine			Type	Journal Article
Year	2018	Publication	Entropy	Abbreviated Journal	ENTROPY
Volume	20	Issue	11	Pages	809
Keywords	hand sign language; deep learning; restricted Boltzmann machine (RBM); multi-modal; profoundly deaf; noisy image
Abstract	In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey’s Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ RKE2018			Serial	3198
Permanent link to this record



Author	David Aldavert; Marçal Rusiñol
Title	Manuscript text line detection and segmentation using second-order derivatives analysis			Type	Conference Article
Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
Volume		Issue		Pages	293 - 298
Keywords	text line detection; text line segmentation; text region detection; second-order derivatives
Abstract	In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
Address	Viena; Austria; April 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	DAS
Notes	DAG; 600.084; 600.129; 302.065; 600.121			Approved	no
Call Number	Admin @ si @ AlR2018a			Serial	3104
Permanent link to this record



Author	V. Poulain d'Andecy; Emmanuel Hartmann; Marçal Rusiñol
Title	Field Extraction by hybrid incremental and a-priori structural templates			Type	Conference Article
Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
Volume		Issue		Pages	251 - 256
Keywords	Layout Analysis; information extraction; incremental learning
Abstract	In this paper, we present an incremental framework for extracting information fields from administrative documents. First, we demonstrate some limits of the existing state-of-the-art methods such as the delay of the system efficiency. This is a concern in industrial context when we have only few samples of each document class. Based on this analysis, we propose a hybrid system combining incremental learning by means of itf-df statistics and a-priori generic models. We report in the experimental section our results obtained with a dataset of real invoices.
Address	Viena; Austria; April 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	DAS
Notes	DAG; 600.084; 600.129; 600.121			Approved	no
Call Number	Admin @ si @ PHR2018			Serial	3106
Permanent link to this record



Author	Shanxin Yuan; Guillermo Garcia-Hernando; Bjorn Stenger; Gyeongsik Moon; Ju Yong Chang; Kyoung Mu Lee; Pavlo Molchanov; Jan Kautz; Sina Honari; Liuhao Ge; Junsong Yuan; Xinghao Chen; Guijin Wang; Fan Yang; Kai Akiyama; Yang Wu; Qingfu Wan; Meysam Madadi; Sergio Escalera; Shile Li; Dongheui Lee; Iason Oikonomidis; Antonis Argyros; Tae-Kyun Kim
Title	Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals			Type	Conference Article
Year	2018	Publication	31st IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	2636 - 2645
Keywords	Three-dimensional displays; Task analysis; Pose estimation; Two dimensional displays; Joints; Training; Solid modeling
Abstract	In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
Address	Salt Lake City; USA; June 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPR
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ YGS2018			Serial	3115
Permanent link to this record



Author	Sounak Dey; Anjan Dutta; Suman Ghosh; Ernest Valveny; Josep Llados; Umapada Pal
Title	Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch			Type	Conference Article
Year	2018	Publication	24th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	916 - 921
Keywords
Abstract	In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
Address	Beijing; China; August 2018
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG; 602.167; 602.168; 600.097; 600.084; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ DDG2018b			Serial	3152
Permanent link to this record



Author	Cristhian A. Aguilera-Carrasco; C. Aguilera; Angel Sappa
Title	Melamine Faced Panels Defect Classification beyond the Visible Spectrum			Type	Journal Article
Year	2018	Publication	Sensors	Abbreviated Journal	SENS
Volume	18	Issue	11	Pages	1-10
Keywords	industrial application; infrared; machine learning
Abstract	In this work, we explore the use of images from different spectral bands to classify defects in melamine faced panels, which could appear through the production process. Through experimental evaluation, we evaluate the use of images from the visible (VS), near-infrared (NIR), and long wavelength infrared (LWIR), to classify the defects using a feature descriptor learning approach together with a support vector machine classifier. Two descriptors were evaluated, Extended Local Binary Patterns (E-LBP) and SURF using a Bag of Words (BoW) representation. The evaluation was carried on with an image set obtained during this work, which contained five different defect categories that currently occurs in the industry. Results show that using images from beyond the visual spectrum helps to improve classification performance in contrast with a single visible spectrum solution.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU; 600.122			Approved	no
Call Number	Admin @ si @ AAS2018			Serial	3191
Permanent link to this record