Home | [11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Sergi Garcia Bordils; Andres Mafla; Ali Furkan Biten; Oren Nuriel; Aviad Aberdam; Shai Mazor; Ron Litman; Dimosthenis Karatzas | ||||
Title | Out-of-Vocabulary Challenge Report | Type | Conference Article | ||
Year | 2022 | Publication | Proceedings European Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | 13804 | Issue | Pages | 359–375 | |
Keywords | |||||
Abstract ![]() |
This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character Recognition (OCR) models, namely, the recognition of unseen scene text instances at training time. The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances, thus covering a wide range of data distributions. A new and independent validation and test set is formed with scene text instances that are out of vocabulary at training time. The competition was structured in two tasks, end-to-end and cropped scene text recognition respectively. A thorough analysis of results from baselines and different participants is presented. Interestingly, current state-of-the-art models show a significant performance gap under the newly studied setting. We conclude that the OOV dataset proposed in this challenge will be an essential area to be explored in order to develop scene text models that achieve more robust and generalized predictions. | ||||
Address | Tel-Aviv; Israel; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCVW | ||
Notes | DAG; 600.155; 302.105; 611.002 | Approved | no | ||
Call Number | Admin @ si @ GMB2022 | Serial | 3771 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | ICDAR 2019 Competition on Scene Text Visual Question Answering | Type | Conference Article | ||
Year | 2019 | Publication | 15th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1563-1570 | ||
Keywords | |||||
Abstract ![]() |
This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23,038 images annotated with 31,791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding. | ||||
Address | Sydney; Australia; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.129; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BTM2019c | Serial | 3286 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; M. Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | ICDAR 2019 Competition on Scene Text Visual Question Answering | Type | Conference Article | ||
Year | 2019 | Publication | 3rd Workshop on Closing the Loop Between Vision and Language, in conjunction with ICCV2019 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed
by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23, 038 images annotated with 31, 791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from 7 different public computer vision datasets, covering a wide range of scenarios. The competition was structured in three tasks of increasing difficulty, that require reading the text in a scene and understanding it in the context of the scene, to correctly answer a given question. A novel evaluation metric is presented, which elegantly assesses both key capabilities expected from an optimal model: text recognition and image understanding. A detailed analysis of results from different participants is showcased, which provides insight into the current capabilities of VQA systems that can read. We firmly believe the dataset proposed in this challenge will be an important milestone to consider towards a path of more robust and general models that can exploit scene text to achieve holistic image understanding. |
||||
Address | Sydney; Australia; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CLVL | ||
Notes | DAG; 600.129; 601.338; 600.135; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BTM2019a | Serial | 3284 | ||
Permanent link to this record | |||||
Author | Ariel Amato; Felipe Lumbreras; Angel Sappa | ||||
Title | A General-purpose Crowdsourcing Platform for Mobile Devices | Type | Conference Article | ||
Year | 2014 | Publication | 9th International Conference on Computer Vision Theory and Applications | Abbreviated Journal | |
Volume | 3 | Issue | Pages | 211-215 | |
Keywords | Crowdsourcing Platform; Mobile Crowdsourcing | ||||
Abstract ![]() |
This paper presents details of a general purpose micro-task on-demand platform based on the crowdsourcing philosophy. This platform was specifically developed for mobile devices in order to exploit the strengths of such devices; namely: i) massivity, ii) ubiquity and iii) embedded sensors. The combined use of mobile platforms and the crowdsourcing model allows to tackle from the simplest to the most complex tasks. Users experience is the highlighted feature of this platform (this fact is extended to both task-proposer and tasksolver). Proper tools according with a specific task are provided to a task-solver in order to perform his/her job in a simpler, faster and appealing way. Moreover, a task can be easily submitted by just selecting predefined templates, which cover a wide range of possible applications. Examples of its usage in computer vision and computer games are provided illustrating the potentiality of the platform. | ||||
Address | Lisboa; Portugal; January 2014 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISAPP | ||
Notes | ISE; ADAS; 600.054; 600.055; 600.076; 600.078 | Approved | no | ||
Call Number | Admin @ si @ ALS2014 | Serial | 2478 | ||
Permanent link to this record | |||||
Author | Monica Piñol; Angel Sappa; Ricardo Toledo | ||||
Title | Adaptive Feature Descriptor Selection based on a Multi-Table Reinforcement Learning Strategy | Type | Journal Article | ||
Year | 2015 | Publication | Neurocomputing | Abbreviated Journal | NEUCOM |
Volume | 150 | Issue | A | Pages | 106–115 |
Keywords | Reinforcement learning; Q-learning; Bag of features; Descriptors | ||||
Abstract ![]() |
This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.055; 600.076 | Approved | no | ||
Call Number | Admin @ si @ PST2015 | Serial | 2473 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Dario Carpio; Angel Sappa | ||||
Title | Depth Map Estimation from a Single 2D Image | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Signal-Image Technology & Internet-Based Systems | Abbreviated Journal | |
Volume | Issue | Pages | 347-353 | ||
Keywords | |||||
Abstract ![]() |
This paper presents an innovative architecture based on a Cycle Generative Adversarial Network (CycleGAN) for the synthesis of high-quality depth maps from monocular images. The proposed architecture leverages a diverse set of loss functions, including cycle consistency, contrastive, identity, and least square losses, to facilitate the generation of depth maps that exhibit realism and high fidelity. A notable feature of the approach is its ability to synthesize depth maps from grayscale images without the need for paired training data. Extensive comparisons with different state-of-the-art methods show the superiority of the proposed approach in both quantitative metrics and visual quality. This work addresses the challenge of depth map synthesis and offers significant advancements in the field. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SCS2023b | Serial | 4009 | ||
Permanent link to this record | |||||
Author | Agnes Borras; Josep Llados | ||||
Title | Object Image Retrieval by Shape Content in Complex Scenes Using Geometric Constraints | Type | Book Chapter | ||
Year | 2005 | Publication | Pattern Recognition And Image Analysis | Abbreviated Journal | LNCS |
Volume | 3522 | Issue | Pages | 325–332 | |
Keywords | |||||
Abstract ![]() |
This paper presents an image retrieval system based on 2D shape information. Query shape objects and database images are repre- sented by polygonal approximations of their contours. Afterwards they are encoded, using geometric features, in terms of predefined structures. Shapes are then located in database images by a voting procedure on the spatial domain. Then an alignment matching provides a probability value to rank de database image in the retrieval result. The method al- lows to detect a query object in database images even when they contain complex scenes. Also the shape matching tolerates partial occlusions and affine transformations as translation, rotation or scaling. | ||||
Address | Estoril (Portugal) | ||||
Corporate Author | Thesis | ||||
Publisher | Springer Link | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; | Approved | no | ||
Call Number | DAG @ dag @ BoL2005; IAM @ iam @ BoL2005 | Serial | 556 | ||
Permanent link to this record | |||||
Author | David Fernandez; Jon Almazan; Nuria Cirera; Alicia Fornes; Josep Llados | ||||
Title | BH2M: the Barcelona Historical Handwritten Marriages database | Type | Conference Article | ||
Year | 2014 | Publication | 22nd International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 256 - 261 | ||
Keywords | |||||
Abstract ![]() |
This paper presents an image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms. The contribution of this paper is twofold. First, it presents a complete ground truth which covers the whole pipeline of handwriting
recognition research, from layout analysis to recognition and understanding. Second, it is the first dataset in the emerging area of genealogical document analysis, where documents are manuscripts pseudo-structured with specific lexicons and the interest is beyond pure transcriptions but context dependent. |
||||
Address | Creete Island; Grecia; September 2014 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1051-4651 | ISBN | Medium | ||
Area | Expedition | Conference | ICPR | ||
Notes | DAG; 600.056; 600.061; 602.006; 600.077 | Approved | no | ||
Call Number | Admin @ si @ FAC2014 | Serial | 2461 | ||
Permanent link to this record | |||||
Author | Lluis Pere de las Heras; Ernest Valveny; Gemma Sanchez | ||||
Title | Combining structural and statistical strategies for unsupervised wall detection in floor plans | Type | Conference Article | ||
Year | 2013 | Publication | 10th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
This paper presents an evolution of the first unsupervised wall segmentation method in floor plans, that was presented by the authors in [1]. This first approach, contrarily to the existing ones, is able to segment walls independently to their notation and without the need of any pre-annotated data
to learn their visual appearance. Despite the good performance of the first approach, some specific cases, such as curved shaped walls, were not correctly segmented since they do not agree the strict structural assumptions that guide the whole methodology in order to be able to learn, in an unsupervised way, the structure of a wall. In this paper, we refine this strategy by dividing the process in two steps. In a first step, potential wall segments are extracted unsupervisedly using a modification of [1], by restricting even more the areas considered as walls in a first moment. In a second step, these segments are used to learn and spot lost instances based on a modified version of [2], also presented by the authors. The presented combined method have been tested on 4 datasets with different notations and compared with the stateof-the-art applyed on the same datasets. The results show its adaptability to different wall notations and shapes, significantly outperforming the original approach. |
||||
Address | Bethlehem; PA; USA; August 2013 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GREC | ||
Notes | DAG; 600.045 | Approved | no | ||
Call Number | Admin @ si @ HVS2013a | Serial | 2321 | ||
Permanent link to this record | |||||
Author | Fernando Barrera; Felipe Lumbreras; Angel Sappa | ||||
Title | Evaluation of Similarity Functions in Multimodal Stereo | Type | Conference Article | ||
Year | 2012 | Publication | 9th International Conference on Image Analysis and Recognition | Abbreviated Journal | |
Volume | 7324 | Issue | I | Pages | 320-329 |
Keywords | Aveiro, Portugal | ||||
Abstract ![]() |
This paper presents an evaluation framework for multimodal stereo matching, which allows to compare the performance of four similarity functions. Additionally, it presents details of a multimodal stereo head that supply thermal infrared and color images, as well as, aspects of its calibration and rectification. The pipeline includes a novel method for the disparity selection, which is suitable for evaluating the similarity functions. Finally, a benchmark for comparing different initializations of the proposed framework is presented. Similarity functions are based on mutual information, gradient orientation and scale space representations. Their evaluation is performed using two metrics: i) disparity error, and ii) number of correct matches on planar regions. In addition to the proposed evaluation, the current paper also shows that 3D sparse representations can be recovered from such a multimodal stereo head. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-31294-6 | Medium | |
Area | Expedition | Conference | ICIAR | ||
Notes | ADAS | Approved | no | ||
Call Number | BLS2012a | Serial | 2014 | ||
Permanent link to this record | |||||
Author | Angel Sappa; Fadi Dornaika; David Geronimo; Antonio Lopez | ||||
Title | Efficient On-Board Stereo Vision Pose Estimation | Type | Conference Article | ||
Year | 2007 | Publication | Computer Aided Systems Theory, Selected paper from | Abbreviated Journal | |
Volume | 4739 | Issue | Pages | 1183–1190 | |
Keywords | |||||
Abstract ![]() |
This paper presents an efficient technique for real time estimation of on-board stereo vision system pose. The whole process is performed in the Euclidean space and consists of two stages. Initially, a compact representation of the original 3D data points is computed. Then, a RANSAC based least squares approach is used for fitting a plane to the 3D road points. Fast RANSAC fitting is obtained by selecting points according to a probability distribution function that takes into account the density of points at a given depth. Finally, stereo camera position
and orientation—pose—is computed relative to the road plane. The proposed technique is intended to be used on driver assistance systems for applications such as obstacle or pedestrian detection. A real time performance is reached. Experimental results on several environments and comparisons with a previous work are presented. |
||||
Address | Las Palmas de Gran Canaria (Spain) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | EUROCAST | ||
Notes | ADAS | Approved | no | ||
Call Number | ADAS @ adas @ SDG2007b | Serial | 916 | ||
Permanent link to this record | |||||
Author | Angel Sappa; Fadi Dornaika; Daniel Ponsa; David Geronimo; Antonio Lopez | ||||
Title | An Efficient Approach to Onboard Stereo Vision System Pose Estimation | Type | Journal Article | ||
Year | 2008 | Publication | IEEE Transactions on Intelligent Transportation Systems | Abbreviated Journal | TITS |
Volume | 9 | Issue | 3 | Pages | 476–490 |
Keywords | Camera extrinsic parameter estimation, ground plane estimation, onboard stereo vision system | ||||
Abstract ![]() |
This paper presents an efficient technique for estimating the pose of an onboard stereo vision system relative to the environment’s dominant surface area, which is supposed to be the road surface. Unlike previous approaches, it can be used either for urban or highway scenarios since it is not based on a specific visual traffic feature extraction but on 3-D raw data points. The whole process is performed in the Euclidean space and consists of two stages. Initially, a compact 2-D representation of the original 3-D data points is computed. Then, a RANdom SAmple Consensus (RANSAC) based least-squares approach is used to fit a plane to the road. Fast RANSAC fitting is obtained by selecting points according to a probability function that takes into account the density of points at a given depth. Finally, stereo camera height and pitch angle are computed related to the fitted road plane. The proposed technique is intended to be used in driverassistance systems for applications such as vehicle or pedestrian detection. Experimental results on urban environments, which are the most challenging scenarios (i.e., flat/uphill/downhill driving, speed bumps, and car’s accelerations), are presented. These results are validated with manually annotated ground truth. Additionally, comparisons with previous works are presented to show the improvements in the central processing unit processing time, as well as in the accuracy of the obtained results. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | IEEE | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | ADAS @ adas @ SDP2008 | Serial | 1000 | ||
Permanent link to this record | |||||
Author | Adriana Romero; Carlo Gatta; Gustavo Camps-Valls | ||||
Title | Unsupervised Deep Feature Extraction Of Hyperspectral Images | Type | Conference Article | ||
Year | 2014 | Publication | 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Convolutional networks; deep learning; sparse learning; feature extraction; hyperspectral image classification | ||||
Abstract ![]() |
This paper presents an effective unsupervised sparse feature learning algorithm to train deep convolutional networks on hyperspectral images. Deep convolutional hierarchical representations are learned and then used for pixel classification. Features in lower layers present less abstract representations of data, while higher layers represent more abstract and complex characteristics. We successfully illustrate the performance of the extracted representations in a challenging AVIRIS hyperspectral image classification problem, compared to standard dimensionality reduction methods like principal component analysis (PCA) and its kernel counterpart (kPCA). The proposed method largely outperforms the previous state-ofthe-art results on the same experimental setting. Results show that single layer networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels. Regarding the deep architecture, we can conclude that: (1) additional layers in a deep architecture significantly improve the performance w.r.t. single layer variants; (2) the max-pooling step in each layer is mandatory to achieve satisfactory results; and (3) the performance gain w.r.t. the number of layers is upper bounded, since the spatial resolution is reduced at each pooling, resulting in too spatially coarse output features. | ||||
Address | Lausanne; Switzerland; June 2014 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WHISPERS | ||
Notes | MILAB; LAMP; 600.079 | Approved | no | ||
Call Number | Admin @ si @ RGC2014 | Serial | 2513 | ||
Permanent link to this record | |||||
Author | Laura Igual; Joan Carles Soliva; Antonio Hernandez; Sergio Escalera; Oscar Vilarroya; Petia Radeva | ||||
Title | Supervised Brain Segmentation and Classification in Diagnostic of Attention-Deficit/Hyperactivity Disorder | Type | Conference Article | ||
Year | 2012 | Publication | High Performance Computing and Simulation, International Conference on | Abbreviated Journal | |
Volume | Issue | Pages | 182-187 | ||
Keywords | |||||
Abstract ![]() |
This paper presents an automatic method for external and internal segmentation of the caudate nucleus in Magnetic Resonance Images (MRI) based on statistical and structural machine learning approaches. This method is applied in Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis. The external segmentation method adapts the Graph Cut energy-minimization model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus. In particular, new energy function data and boundary potentials are defined and a supervised energy term based on contextual brain structures is added. Furthermore, the internal segmentation method learns a classifier based on shape features of the Region of Interest (ROI) in MRI slices. The results show accurate external and internal caudate segmentation in a real data set and similar performance of ADHD diagnostic test to manual annotation. | ||||
Address | Madrid | ||||
Corporate Author | Thesis | ||||
Publisher | IEEE Xplore | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-4673-2359-8 | Medium | ||
Area | Expedition | Conference | HPCS | ||
Notes | MILAB;HuPBA | Approved | no | ||
Call Number | Admin @ si @ ISH2012a | Serial | 2038 | ||
Permanent link to this record | |||||
Author | Fadi Dornaika; Bogdan Raducanu | ||||
Title | Single Snapshot 3D Head Pose Initialization for Tracking in Human Robot Interaction Scenario | Type | Conference Article | ||
Year | 2010 | Publication | 1st International Workshop on Computer Vision for Human-Robot Interaction | Abbreviated Journal | |
Volume | Issue | Pages | 32–39 | ||
Keywords | 1st International Workshop on Computer Vision for Human-Robot Interaction, in conjunction with IEEE CVPR 2010 | ||||
Abstract ![]() |
This paper presents an automatic 3D head pose initialization scheme for a real-time face tracker with application to human-robot interaction. It has two main contributions. First, we propose an automatic 3D head pose and person specific face shape estimation, based on a 3D deformable model. The proposed approach serves to initialize our realtime 3D face tracker. What makes this contribution very attractive is that the initialization step can cope with faces
under arbitrary pose, so it is not limited only to near-frontal views. Second, the previous framework is used to develop an application in which the orientation of an AIBO’s camera can be controlled through the imitation of user’s head pose. In our scenario, this application is used to build panoramic images from overlapping snapshots. Experiments on real videos confirm the robustness and usefulness of the proposed methods. |
||||
Address | San Francisco; CA; USA; June 2010 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 2160-7508 | ISBN | 978-1-4244-7029-7 | Medium | |
Area | Expedition | Conference | CVPRW | ||
Notes | OR;MV | Approved | no | ||
Call Number | BCNPCL @ bcnpcl @ DoR2010a | Serial | 1309 | ||
Permanent link to this record |