Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	1111–1125 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[61–70] << 71 72 73 74 75 76 77 78 79 80 >> [81–90]

List View

Citations

Details

	Records
	Author	Albert Gordo; Florent Perronnin
	Title	Asymmetric Distances for Binary Embeddings			Type	Conference Article
	Year	2011	Publication	IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	729 - 736
	Keywords
	Abstract	In large-scale query-by-example retrieval, embedding image signatures in a binary space offers two benefits: data compression and search efficiency. While most embedding algorithms binarize both query and database signatures, it has been noted that this is not strictly a requirement. Indeed, asymmetric schemes which binarize the database signatures but not the query still enjoy the same two benefits but may provide superior accuracy. In this work, we propose two general asymmetric distances which are applicable to a wide variety of embedding techniques including Locality Sensitive Hashing (LSH), Locality Sensitive Binary Codes (LSBC), Spectral Hashing (SH) and Semi-Supervised Hashing (SSH). We experiment on four public benchmarks containing up to 1M images and show that the proposed asymmetric distances consistently lead to large improvements over the symmetric Hamming distance for all binary embedding techniques. We also propose a novel simple binary embedding technique – PCA Embedding (PCAE) – which is shown to yield competitive results with respect to more complex algorithms such as SH and SSH.
	Address	Providence, RI
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-4577-0394-2	Medium
	Area		Expedition		Conference	CVPR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ GoP2011; IAM @ iam @ GoP2011			Serial	1817
Permanent link to this record



	Author	Antonio Hernandez; Nadezhda Zlateva; Alexander Marinov; Miguel Reyes; Petia Radeva; Dimo Dimov; Sergio Escalera
	Title	Graph Cuts Optimization for Multi-Limb Human Segmentation in Depth Maps			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	726-732
	Keywords
	Abstract	We present a generic framework for object segmentation using depth maps based on Random Forest and Graph-cuts theory, and apply it to the segmentation of human limbs in depth maps. First, from a set of random depth features, Random Forest is used to infer a set of label probabilities for each data sample. This vector of probabilities is used as unary term in α-β swap Graph-cuts algorithm. Moreover, depth of spatio-temporal neighboring data points are used as boundary potentials. Results on a new multi-label human depth data set show high performance in terms of segmentation overlapping of the novel methodology compared to classical approaches.
	Address	Portland; Oregon; June 2013
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	MILAB;HuPBA			Approved	no
	Call Number	Admin @ si @ HZM2012b			Serial	2046
Permanent link to this record



	Author	Fahad Shahbaz Khan; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Maria Vanrell; Antonio Lopez
	Title	Color Attributes for Object Detection			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	3306-3313
	Keywords	pedestrian detection
	Abstract	State-of-the-art object detectors typically use shape information as a low level feature representation to capture the local structure of an object. This paper shows that early fusion of shape and color, as is popular in image classification, leads to a significant drop in performance for object detection. Moreover, such approaches also yields suboptimal results for object categories with varying importance of color and shape. In this paper we propose the use of color attributes as an explicit color representation for object detection. Color attributes are compact, computationally efficient, and when combined with traditional shape features provide state-ofthe- art results for object detection. Our method is tested on the PASCAL VOC 2007 and 2009 datasets and results clearly show that our method improves over state-of-the-art techniques despite its simplicity. We also introduce a new dataset consisting of cartoon character images in which color plays a pivotal role. On this dataset, our approach yields a significant gain of 14% in mean AP over conventional state-of-the-art methods.
	Address	Providence; Rhode Island; USA;
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	ADAS; CIC;			Approved	no
	Call Number	Admin @ si @ KRW2012			Serial	1935
Permanent link to this record



	Author	Naila Murray; Luca Marchesotti; Florent Perronnin
	Title	AVA: A Large-Scale Database for Aesthetic Visual Analysis			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	2408-2415
	Keywords
	Abstract	With the ever-expanding volume of visual content available, the ability to organize and navigate such content by aesthetic preference is becoming increasingly important. While still in its nascent stage, research into computational models of aesthetic preference already shows great potential. However, to advance research, realistic, diverse and challenging databases are needed. To this end, we introduce a new large-scale database for conducting Aesthetic Visual Analysis: AVA. It contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style. We show the advantages of AVA with respect to existing databases in terms of scale, diversity, and heterogeneity of annotations. We then describe several key insights into aesthetic preference afforded by AVA. Finally, we demonstrate, through three applications, how the large scale of AVA can be leveraged to improve performance on existing preference tasks
	Address	Providence, Rhode Islan
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	CIC			Approved	no
	Call Number	Admin @ si @ MMP2012a			Serial	2025
Permanent link to this record



	Author	Marc Serra; Olivier Penacchio; Robert Benavente; Maria Vanrell
	Title	Names and Shades of Color for Intrinsic Image Estimation			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	278-285
	Keywords
	Abstract	In the last years, intrinsic image decomposition has gained attention. Most of the state-of-the-art methods are based on the assumption that reflectance changes come along with strong image edges. Recently, user intervention in the recovery problem has proved to be a remarkable source of improvement. In this paper, we propose a novel approach that aims to overcome the shortcomings of pure edge-based methods by introducing strong surface descriptors, such as the color-name descriptor which introduces high-level considerations resembling top-down intervention. We also use a second surface descriptor, termed color-shade, which allows us to include physical considerations derived from the image formation model capturing gradual color surface variations. Both color cues are combined by means of a Markov Random Field. The method is quantitatively tested on the MIT ground truth dataset using different error metrics, achieving state-of-the-art performance.
	Address	Providence, Rhode Island
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	CIC			Approved	no
	Call Number	Admin @ si @ SPB2012			Serial	2026
Permanent link to this record



	Author	Murad Al Haj; Jordi Gonzalez; Larry S. Davis
	Title	On Partial Least Squares in Head Pose Estimation: How to simultaneously deal with misalignment			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	2602-2609
	Keywords
	Abstract	Head pose estimation is a critical problem in many computer vision applications. These include human computer interaction, video surveillance, face and expression recognition. In most prior work on heads pose estimation, the positions of the faces on which the pose is to be estimated are specified manually. Therefore, the results are reported without studying the effect of misalignment. We propose a method based on partial least squares (PLS) regression to estimate pose and solve the alignment problem simultaneously. The contributions of this paper are two-fold: 1) we show that the kernel version of PLS (kPLS) achieves better than state-of-the-art results on the estimation problem and 2) we develop a technique to reduce misalignment based on the learned PLS factors.
	Address	Providence, Rhode Island
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ HGD2012			Serial	2029
Permanent link to this record



	Author	Jose Carlos Rubio; Joan Serrat; Antonio Lopez
	Title	Unsupervised co-segmentation through region matching			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	749-756
	Keywords
	Abstract	Co-segmentation is defined as jointly partitioning multiple images depicting the same or similar object, into foreground and background. Our method consists of a multiple-scale multiple-image generative model, which jointly estimates the foreground and background appearance distributions from several images, in a non-supervised manner. In contrast to other co-segmentation methods, our approach does not require the images to have similar foregrounds and different backgrounds to function properly. Region matching is applied to exploit inter-image information by establishing correspondences between the common objects that appear in the scene. Moreover, computing many-to-many associations of regions allow further applications, like recognition of object parts across images. We report results on iCoseg, a challenging dataset that presents extreme variability in camera viewpoint, illumination and object deformations and poses. We also show that our method is robust against large intra-class variability in the MSRC database.
	Address	Providence, Rhode Island
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	ADAS			Approved	no
	Call Number	Admin @ si @ RSL2012b; ADAS @ adas @			Serial	2033
Permanent link to this record



	Author	Albert Gordo; Jose Antonio Rodriguez; Florent Perronnin; Ernest Valveny
	Title	Leveraging category-level labels for instance-level image retrieval			Type	Conference Article
	Year	2012	Publication	25th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	3045-3052
	Keywords
	Abstract	In this article, we focus on the problem of large-scale instance-level image retrieval. For efficiency reasons, it is common to represent an image by a fixed-length descriptor which is subsequently encoded into a small number of bits. We note that most encoding techniques include an unsupervised dimensionality reduction step. Our goal in this work is to learn a better subspace in a supervised manner. We especially raise the following question: “can category-level labels be used to learn such a subspace?” To answer this question, we experiment with four learning techniques: the first one is based on a metric learning framework, the second one on attribute representations, the third one on Canonical Correlation Analysis (CCA) and the fourth one on Joint Subspace and Classifier Learning (JSCL). While the first three approaches have been applied in the past to the image retrieval problem, we believe we are the first to show the usefulness of JSCL in this context. In our experiments, we use ImageNet as a source of category-level labels and report retrieval results on two standard dataseis: INRIA Holidays and the University of Kentucky benchmark. Our experimental study shows that metric learning and attributes do not lead to any significant improvement in retrieval accuracy, as opposed to CCA and JSCL. As an example, we report on Holidays an increase in accuracy from 39.3% to 48.6% with 32-dimensional representations. Overall JSCL is shown to yield the best results.
	Address	Providence, Rhode Island
	Corporate Author				Thesis
	Publisher	IEEE Xplore	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN	978-1-4673-1226-4	Medium
	Area		Expedition		Conference	CVPR
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ GRP2012			Serial	2050
Permanent link to this record



	Author	Rahat Khan; Joost Van de Weijer; Fahad Shahbaz Khan; Damien Muselet; christophe Ducottet; Cecile Barat
	Title	Discriminative Color Descriptors			Type	Conference Article
	Year	2013	Publication	IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	2866 - 2873
	Keywords
	Abstract	Color description is a challenging task because of large variations in RGB values which occur due to scene accidental events, such as shadows, shading, specularities, illuminant color changes, and changes in viewing geometry. Traditionally, this challenge has been addressed by capturing the variations in physics-based models, and deriving invariants for the undesired variations. The drawback of this approach is that sets of distinguishable colors in the original color space are mapped to the same value in the photometric invariant space. This results in a drop of discriminative power of the color description. In this paper we take an information theoretic approach to color description. We cluster color values together based on their discriminative power in a classification problem. The clustering has the explicit objective to minimize the drop of mutual information of the final representation. We show that such a color description automatically learns a certain degree of photometric invariance. We also show that a universal color representation, which is based on other data sets than the one at hand, can obtain competing performance. Experiments show that the proposed descriptor outperforms existing photometric invariants. Furthermore, we show that combined with shape description these color descriptors obtain excellent results on four challenging datasets, namely, PASCAL VOC 2007, Flowers-102, Stanford dogs-120 and Birds-200.
	Address	Portland; Oregon; June 2013
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1063-6919	ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	CIC; 600.048			Approved	no
	Call Number	Admin @ si @ KWK2013a			Serial	2262
Permanent link to this record



	Author	Marc Serra; Olivier Penacchio; Robert Benavente; Maria Vanrell; Dimitris Samaras
	Title	The Photometry of Intrinsic Images			Type	Conference Article
	Year	2014	Publication	27th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	1494-1501
	Keywords
	Abstract	Intrinsic characterization of scenes is often the best way to overcome the illumination variability artifacts that complicate most computer vision problems, from 3D reconstruction to object or material recognition. This paper examines the deficiency of existing intrinsic image models to accurately account for the effects of illuminant color and sensor characteristics in the estimation of intrinsic images and presents a generic framework which incorporates insights from color constancy research to the intrinsic image decomposition problem. The proposed mathematical formulation includes information about the color of the illuminant and the effects of the camera sensors, both of which modify the observed color of the reflectance of the objects in the scene during the acquisition process. By modeling these effects, we get a “truly intrinsic” reflectance image, which we call absolute reflectance, which is invariant to changes of illuminant or camera sensors. This model allows us to represent a wide range of intrinsic image decompositions depending on the specific assumptions on the geometric properties of the scene configuration and the spectral properties of the light source and the acquisition system, thus unifying previous models in a single general framework. We demonstrate that even partial information about sensors improves significantly the estimated reflectance images, thus making our method applicable for a wide range of sensors. We validate our general intrinsic image framework experimentally with both synthetic data and natural images.
	Address	Columbus; Ohio; USA; June 2014
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	CIC; 600.052; 600.051; 600.074			Approved	no
	Call Number	Admin @ si @ SPB2014			Serial	2506
Permanent link to this record



	Author	M. Danelljan; Fahad Shahbaz Khan; Michael Felsberg; Joost Van de Weijer
	Title	Adaptive color attributes for real-time visual tracking			Type	Conference Article
	Year	2014	Publication	27th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	1090 - 1097
	Keywords
	Abstract	Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power. This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms state-of-the-art tracking methods while running at more than 100 frames per second.
	Address	Nottingham; UK; September 2014
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	CIC; LAMP; 600.074; 600.079			Approved	no
	Call Number	Admin @ si @ DKF2014			Serial	2509
Permanent link to this record



	Author	German Ros; Laura Sellart; Joanna Materzynska; David Vazquez; Antonio Lopez
	Title	The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes			Type	Conference Article
	Year	2016	Publication	29th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	3234-3243
	Keywords	Domain Adaptation; Autonomous Driving; Virtual Data; Semantic Segmentation
	Abstract	Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. The irruption of deep convolutional neural networks (DCNNs) allows to foresee obtaining reliable classifiers to perform such a visual task. However, DCNNs require to learn many parameters from raw images; thus, having a sufficient amount of diversified images with this class annotations is needed. These annotations are obtained by a human cumbersome labour specially challenging for semantic segmentation, since pixel-level annotations are required. In this paper, we propose to use a virtual world for automatically generating realistic synthetic images with pixel-level annotations. Then, we address the question of how useful can be such data for the task of semantic segmentation; in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic diversified collection of urban images, named SynthCity, with automatically generated class annotations. We use SynthCity in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments on a DCNN setting that show how the inclusion of SynthCity in the training stage significantly improves the performance of the semantic segmentation task
	Address	Las Vegas; USA; June 2016
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	ADAS; 600.085; 600.082; 600.076			Approved	no
	Call Number	ADAS @ adas @ RSM2016			Serial	2739
Permanent link to this record



	Author	Lluis Gomez; Y. Patel; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas
	Title	Self‐supervised learning of visual features through embedding images into text topic spaces			Type	Conference Article
	Year	2017	Publication	30th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.
	Address	Honolulu; Hawaii; July 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	DAG; 600.084; 600.121			Approved	no
	Call Number	Admin @ si @ GPR2017			Serial	2889
Permanent link to this record



	Author	Cesar de Souza; Adrien Gaidon; Yohann Cabon; Antonio Lopez
	Title	Procedural Generation of Videos to Train Deep Action Recognition Networks			Type	Conference Article
	Year	2017	Publication	30th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	2594-2604
	Keywords
	Abstract	Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for ”Procedural Human Action Videos”. It contains a total of 39, 982 videos, with more than 1, 000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We introduce a deep multi-task representation learning architecture to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF101 and HMDB51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, significantly outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
	Address	Honolulu; Hawaii; July 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	ADAS; 600.076; 600.085; 600.118			Approved	no
	Call Number	Admin @ si @ SGC2017			Serial	3051
Permanent link to this record



	Author	Shanxin Yuan; Guillermo Garcia-Hernando; Bjorn Stenger; Gyeongsik Moon; Ju Yong Chang; Kyoung Mu Lee; Pavlo Molchanov; Jan Kautz; Sina Honari; Liuhao Ge; Junsong Yuan; Xinghao Chen; Guijin Wang; Fan Yang; Kai Akiyama; Yang Wu; Qingfu Wan; Meysam Madadi; Sergio Escalera; Shile Li; Dongheui Lee; Iason Oikonomidis; Antonis Argyros; Tae-Kyun Kim
	Title	Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals			Type	Conference Article
	Year	2018	Publication	31st IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	2636 - 2645
	Keywords	Three-dimensional displays; Task analysis; Pose estimation; Two dimensional displays; Joints; Training; Solid modeling
	Abstract	In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
	Address	Salt Lake City; USA; June 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ YGS2018			Serial	3115
Permanent link to this record