Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–11] |
Records | |||||
---|---|---|---|---|---|
Author | Jianzhy Guo; Zhen Lei; Jun Wan; Egils Avots; Noushin Hajarolasvadi; Boris Knyazev; Artem Kuharenko; Julio C. S. Jacques Junior; Xavier Baro; Hasan Demirel; Sergio Escalera; Juri Allik; Gholamreza Anbarjafari | ||||
Title | Dominant and Complementary Emotion Recognition from Still Images of Faces | Type | Journal Article | ||
Year | 2018 | Publication | IEEE Access | Abbreviated Journal | ACCESS |
Volume | 6 | Issue | Pages | 26391 - 26403 | |
Keywords | |||||
Abstract | Emotion recognition has a key role in affective computing. Recently, fine-grained emotion analysis, such as compound facial expression of emotions, has attracted high interest of researchers working on affective computing. A compound facial emotion includes dominant and complementary emotions (e.g., happily-disgusted and sadly-fearful), which is more detailed than the seven classical facial emotions (e.g., happy, disgust, and so on). Current studies on compound emotions are limited to use data sets with limited number of categories and unbalanced data distributions, with labels obtained automatically by machine learning-based algorithms which could lead to inaccuracies. To address these problems, we released the iCV-MEFED data set, which includes 50 classes of compound emotions and labels assessed by psychologists. The task is challenging due to high similarities of compound facial emotions from different categories. In addition, we have organized a challenge based on the proposed iCV-MEFED data set, held at FG workshop 2017. In this paper, we analyze the top three winner methods and perform further detailed experiments on the proposed data set. Experiments indicate that pairs of compound emotion (e.g., surprisingly-happy vs happily-surprised) are more difficult to be recognized if compared with the seven basic emotions. However, we hope the proposed data set can help to pave the way for further research on compound facial emotion recognition. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ GLW2018 | Serial | 3122 | ||
Permanent link to this record | |||||
Author | Bojana Gajic; Ramon Baldrich | ||||
Title | Cross-domain fashion image retrieval | Type | Conference Article | ||
Year | 2018 | Publication | CVPR 2018 Workshop on Women in Computer Vision (WiCV 2018, 4th Edition) | Abbreviated Journal | |
Volume | Issue | Pages | 19500-19502 | ||
Keywords | |||||
Abstract | Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products
in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task. |
||||
Address | Salt Lake City, USA; 22 June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPRW | ||
Notes | CIC; 600.087 | Approved | no | ||
Call Number | Admin @ si @ | Serial | 3709 | ||
Permanent link to this record | |||||
Author | Katerine Diaz; Jesus Martinez del Rincon; Aura Hernandez-Sabate; Debora Gil | ||||
Title | Continuous head pose estimation using manifold subspace embedding and multivariate regression | Type | Journal Article | ||
Year | 2018 | Publication | IEEE Access | Abbreviated Journal | ACCESS |
Volume | 6 | Issue | Pages | 18325 - 18334 | |
Keywords | Head Pose estimation; HOG features; Generalized Discriminative Common Vectors; B-splines; Multiple linear regression | ||||
Abstract | In this paper, a continuous head pose estimation system is proposed to estimate yaw and pitch head angles from raw facial images. Our approach is based on manifold learningbased methods, due to their promising generalization properties shown for face modelling from images. The method combines histograms of oriented gradients, generalized discriminative common vectors and continuous local regression to achieve successful performance. Our proposal was tested on multiple standard face datasets, as well as in a realistic scenario. Results show a considerable performance improvement and a higher consistence of our model in comparison with other state-of-art methods, with angular errors varying between 9 and 17 degrees. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 2169-3536 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ DMH2018b | Serial | 3091 | ||
Permanent link to this record | |||||
Author | Laura Lopez-Fuentes; Joost Van de Weijer; Manuel Gonzalez-Hidalgo; Harald Skinnemoen; Andrew Bagdanov | ||||
Title | Review on computer vision techniques in emergency situations | Type | Journal Article | ||
Year | 2018 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 77 | Issue | 13 | Pages | 17069–17107 |
Keywords | Emergency management; Computer vision; Decision makers; Situational awareness; Critical situation | ||||
Abstract | In emergency situations, actions that save lives and limit the impact of hazards are crucial. In order to act, situational awareness is needed to decide what to do. Geolocalized photos and video of the situations as they evolve can be crucial in better understanding them and making decisions faster. Cameras are almost everywhere these days, either in terms of smartphones, installed CCTV cameras, UAVs or others. However, this poses challenges in big data and information overflow. Moreover, most of the time there are no disasters at any given location, so humans aiming to detect sudden situations may not be as alert as needed at any point in time. Consequently, computer vision tools can be an excellent decision support. The number of emergencies where computer vision tools has been considered or used is very wide, and there is a great overlap across related emergency research. Researchers tend to focus on state-of-the-art systems that cover the same emergency as they are studying, obviating important research in other fields. In order to unveil this overlap, the survey is divided along four main axes: the types of emergencies that have been studied in computer vision, the objective that the algorithms can address, the type of hardware needed and the algorithms used. Therefore, this review provides a broad overview of the progress of computer vision covering all sorts of emergencies. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.068; 600.120 | Approved | no | ||
Call Number | Admin @ si @ LWG2018 | Serial | 3041 | ||
Permanent link to this record | |||||
Author | Marçal Rusiñol; J. Chazalon; Katerine Diaz | ||||
Title | Augmented Songbook: an Augmented Reality Educational Application for Raising Music Awareness | Type | Journal Article | ||
Year | 2018 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 77 | Issue | 11 | Pages | 13773-13798 |
Keywords | Augmented reality; Document image matching; Educational applications | ||||
Abstract | This paper presents the development of an Augmented Reality mobile application which aims at sensibilizing young children to abstract concepts of music. Such concepts are, for instance, the musical notation or the idea of rhythm. Recent studies in Augmented Reality for education suggest that such technologies have multiple benefits for students, including younger ones. As mobile document image acquisition and processing gains maturity on mobile platforms, we explore how it is possible to build a markerless and real-time application to augment the physical documents with didactic animations and interactive virtual content. Given a standard image processing pipeline, we compare the performance of different local descriptors at two key stages of the process. Results suggest alternatives to the SIFT local descriptors, regarding result quality and computational efficiency, both for document model identification and perspective transform estimation. All experiments are performed on an original and public dataset we introduce here. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; ADAS; 600.084; 600.121; 600.118; 600.129 | Approved | no | ||
Call Number | Admin @ si @ RCD2018 | Serial | 2996 | ||
Permanent link to this record | |||||
Author | Adrian Galdran; Aitor Alvarez-Gila; Alessandro Bria; Javier Vazquez; Marcelo Bertalmio | ||||
Title | On the Duality Between Retinex and Image Dehazing | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 8212–8221 | ||
Keywords | Image color analysis; Task analysis; Atmospheric modeling; Computer vision; Computational modeling; Lighting | ||||
Abstract | Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement and other related tasks. While these two problems are apparently unrelated, the goal of this work is to show that they can be connected by a simple linear relationship. Specifically, most Retinex-based algorithms have the characteristic feature of always increasing image brightness, which turns them into ideal candidates for effective image dehazing by directly applying Retinex to a hazy image whose intensities have been inverted. In this paper, we give theoretical proof that Retinex on inverted intensities is a solution to the image dehazing problem. Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem. | ||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ GAB2018 | Serial | 3146 | ||
Permanent link to this record | |||||
Author | Xialei Liu; Joost Van de Weijer; Andrew Bagdanov | ||||
Title | Leveraging Unlabeled Data for Crowd Counting by Learning to Rank | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 7661 - 7669 | ||
Keywords | Task analysis; Training; Computer vision; Visualization; Estimation; Head; Context modeling | ||||
Abstract | We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of
cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-ofthe-art results. |
||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.109; 600.106; 600.120 | Approved | no | ||
Call Number | Admin @ si @ LWB2018 | Serial | 3159 | ||
Permanent link to this record | |||||
Author | Abel Gonzalez-Garcia; Davide Modolo; Vittorio Ferrari | ||||
Title | Objects as context for detecting their semantic parts | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 6907 - 6916 | ||
Keywords | Proposals; Semantics; Wheels; Automobiles; Context modeling; Task analysis; Object detection | ||||
Abstract | We present a semantic part detection approach that effectively leverages object information. We use the object appearance and its class as indicators of what parts to expect. We also model the expected relative location of parts inside the objects based on their appearance. We achieve this with a new network module, called OffsetNet, that efficiently predicts a variable number of part locations within a given object. Our model incorporates all these cues to
detect parts in the context of their objects. This leads to considerably higher performance for the challenging task of part detection compared to using part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to other part detection methods on both PASCAL-Part and CUB200-2011 datasets. |
||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.109; 600.120 | Approved | no | ||
Call Number | Admin @ si @ GMF2018 | Serial | 3229 | ||
Permanent link to this record | |||||
Author | Chenshen Wu; Luis Herranz; Xialei Liu; Joost Van de Weijer; Bogdan Raducanu | ||||
Title | Memory Replay GANs: Learning to Generate New Categories without Forgetting | Type | Conference Article | ||
Year | 2018 | Publication | 32nd Annual Conference on Neural Information Processing Systems | Abbreviated Journal | |
Volume | Issue | Pages | 5966-5976 | ||
Keywords | |||||
Abstract | Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (ie forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories. | ||||
Address | Montreal; Canada; December 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | NIPS | ||
Notes | LAMP; 600.106; 600.109; 602.200; 600.120 | Approved | no | ||
Call Number | Admin @ si @ WHL2018 | Serial | 3249 | ||
Permanent link to this record | |||||
Author | Yaxing Wang; Joost Van de Weijer; Luis Herranz | ||||
Title | Mix and match networks: encoder-decoder alignment for zero-pair image translation | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 5467 - 5476 | ||
Keywords | |||||
Abstract | We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation). We propose mix and match networks, based on multiple encoders and decoders aligned in such a way that other encoder-decoder pairs can be composed at test time to perform unseen image translation tasks between domains or modalities for which explicit paired samples were not seen during training. We study the impact of autoencoders, side information and losses in improving the alignment and transferability of trained pairwise translation models to unseen translations. We show our approach is scalable and can perform colorization and style transfer between unseen combinations of domains. We evaluate our system in a challenging cross-modal setting where semantic segmentation is estimated from depth images, without explicit access to any depth-semantic segmentation training pairs. Our model outperforms baselines based on pix2pix and CycleGAN models. | ||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.109; 600.106; 600.120 | Approved | no | ||
Call Number | Admin @ si @ WWH2018b | Serial | 3131 | ||
Permanent link to this record | |||||
Author | Felipe Codevilla; Matthias Muller; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy | ||||
Title | End-to-end Driving via Conditional Imitation Learning | Type | Conference Article | ||
Year | 2018 | Publication | IEEE International Conference on Robotics and Automation | Abbreviated Journal | |
Volume | Issue | Pages | 4693 - 4700 | ||
Keywords | |||||
Abstract | Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL | ||||
Address | Brisbane; Australia; May 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICRA | ||
Notes | ADAS; 600.116; 600.124; 600.118 | Approved | no | ||
Call Number | Admin @ si @ CML2018 | Serial | 3108 | ||
Permanent link to this record | |||||
Author | Eduardo Aguilar; Beatriz Remeseiro; Marc Bolaños; Petia Radeva | ||||
Title | Grab, Pay, and Eat: Semantic Food Detection for Smart Restaurants | Type | Journal Article | ||
Year | 2018 | Publication | IEEE Transactions on Multimedia | Abbreviated Journal | |
Volume | 20 | Issue | 12 | Pages | 3266 - 3275 |
Keywords | |||||
Abstract | The increase in awareness of people towards their nutritional habits has drawn considerable attention to the field of automatic food analysis. Focusing on self-service restaurants environment, automatic food analysis is not only useful for extracting nutritional information from foods selected by customers, it is also of high interest to speed up the service solving the bottleneck produced at the cashiers in times of high demand. In this paper, we address the problem of automatic food tray analysis in canteens and restaurants environment, which consists in predicting multiple foods placed on a tray image. We propose a new approach for food analysis based on convolutional neural networks, we name Semantic Food Detection, which integrates in the same framework food localization, recognition and segmentation. We demonstrate that our method improves the state of the art food detection by a considerable margin on the public dataset UNIMIB2016 achieving about 90% in terms of F-measure, and thus provides a significant technological advance towards the automatic billing in restaurant environments. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; no proj | Approved | no | ||
Call Number | Admin @ si @ ARB2018 | Serial | 3236 | ||
Permanent link to this record | |||||
Author | Marco Buzzelli; Joost Van de Weijer; Raimondo Schettini | ||||
Title | Learning Illuminant Estimation from Object Recognition | Type | Conference Article | ||
Year | 2018 | Publication | 25th International Conference on Image Processing | Abbreviated Journal | |
Volume | Issue | Pages | 3234 - 3238 | ||
Keywords | Illuminant estimation; computational color constancy; semi-supervised learning; deep learning; convolutional neural networks | ||||
Abstract | In this paper we present a deep learning method to estimate the illuminant of an image. Our model is not trained with illuminant annotations, but with the objective of improving performance on an auxiliary task such as object recognition. To the best of our knowledge, this is the first example of a deep
learning architecture for illuminant estimation that is trained without ground truth illuminants. We evaluate our solution on standard datasets for color constancy, and compare it with state of the art methods. Our proposal is shown to outperform most deep learning methods in a cross-dataset evaluation setup, and to present competitive results in a comparison with parametric solutions. |
||||
Address | Athens; Greece; October 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICIP | ||
Notes | LAMP; 600.109; 600.120 | Approved | no | ||
Call Number | Admin @ si @ BWS2018 | Serial | 3157 | ||
Permanent link to this record | |||||
Author | Lu Yu; Yongmei Cheng; Joost Van de Weijer | ||||
Title | Weakly Supervised Domain-Specific Color Naming Based on Attention | Type | Conference Article | ||
Year | 2018 | Publication | 24th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 3019 - 3024 | ||
Keywords | |||||
Abstract | The majority of existing color naming methods focuses on the eleven basic color terms of the English language. However, in many applications, different sets of color names are used for the accurate description of objects. Labeling data to learn these domain-specific color names is an expensive and laborious task. Therefore, in this article we aim to learn color names from weakly labeled data. For this purpose, we add an attention branch to the color naming network. The attention branch is used to modulate the pixel-wise color naming predictions of the network. In experiments, we illustrate that the attention branch correctly identifies the relevant regions. Furthermore, we show that our method obtains state-of-the-art results for pixel-wise and image-wise classification on the EBAY dataset and is able to learn color names for various domains. | ||||
Address | Beijing; August 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | LAMP; 600.109; 602.200; 600.120 | Approved | no | ||
Call Number | Admin @ si @ YCW2018 | Serial | 3243 | ||
Permanent link to this record | |||||
Author | Shanxin Yuan; Guillermo Garcia-Hernando; Bjorn Stenger; Gyeongsik Moon; Ju Yong Chang; Kyoung Mu Lee; Pavlo Molchanov; Jan Kautz; Sina Honari; Liuhao Ge; Junsong Yuan; Xinghao Chen; Guijin Wang; Fan Yang; Kai Akiyama; Yang Wu; Qingfu Wan; Meysam Madadi; Sergio Escalera; Shile Li; Dongheui Lee; Iason Oikonomidis; Antonis Argyros; Tae-Kyun Kim | ||||
Title | Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 2636 - 2645 | ||
Keywords | Three-dimensional displays; Task analysis; Pose estimation; Two dimensional displays; Joints; Training; Solid modeling | ||||
Abstract | In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints. | ||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ YGS2018 | Serial | 3115 | ||
Permanent link to this record |