toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
doi  openurl
  Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue Pages 109834  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.155; 600.121 Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3825  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Debora Gil; Jaume Garcia; Manuel Vazquez; Ruth Aris; Guillaume Houzeaux edit   pdf
url  openurl
  Title Patient-Sensitive Anatomic and Functional 3D Model of the Left Ventricle Function Type Conference Article
  Year 2008 Publication 8th World Congress on Computational Mechanichs (WCCM8)/5th European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008) Abbreviated Journal  
  Volume Issue Pages  
  Keywords Left Ventricle; Electromechanical Models; Image Processing; Magnetic Resonance.  
  Abstract Early diagnosis and accurate treatment of Left Ventricle (LV) dysfunction significantly increases the patient survival. Impairment of LV contractility due to cardiovascular diseases is reflected in its motion patterns. Recent advances in medical imaging, such as Magnetic Resonance (MR), have encouraged research on 3D simulation and modelling of the LV dynamics. Most of the existing 3D models consider just the gross anatomy of the LV and restore a truncated ellipse which deforms along the cardiac cycle. The contraction mechanics of any muscle strongly depends on the spatial orientation of its muscular fibers since the motion that the muscle undergoes mainly takes place along the fibers. It follows that such simplified models do not allow evaluation of the heart electro-mechanical function and coupling, which has recently risen as the key point for understanding the LV functionality . In order to thoroughly understand the LV mechanics it is necessary to consider the complete anatomy of the LV given by the orientation of the myocardial fibres in 3D space as described by Torrent Guasp. We propose developing a 3D patient-sensitive model of the LV integrating, for the first time, the ven- tricular band anatomy (fibers orientation), the LV gross anatomy and its functionality. Such model will represent the LV function as a natural consequence of its own ventricular band anatomy. This might be decisive in restoring a proper LV contraction in patients undergoing pace marker treatment. The LV function is defined as soon as the propagation of the contractile electromechanical pulse has been modelled. In our experiments we have used the wave equation for the propagation of the electric pulse. The electromechanical wave moves on the myocardial surface and should have a conductivity tensor oriented along the muscular fibers. Thus, whatever mathematical model for electric pulse propa- gation [4] we consider, the complete anatomy of the LV should be extracted. The LV gross anatomy is obtained by processing multi slice MR images recorded for each patient. Information about the myocardial fibers distribution can only be extracted by Diffusion Tensor Imag- ing (DTI), which can not provide in vivo information for each patient. As a first approach, we have computed an average model of fibers from several DTI studies of canine hearts. This rough anatomy is the input for our electro-mechanical propagation model simulating LV dynamics. The average fiber orientation is updated until the simulated LV motion agrees with the experimental evidence provided by the LV motion observed in tagged MR (TMR) sequences. Experimental LV motion is recovered by applying image processing, differential geometry and interpolation techniques to 2D TMR slices [5]. The pipeline in figure 1 outlines the interaction between simulations and experimental data leading to our patient-tailored model.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Venezia (Italia) Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) B-31470-08 ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number IAM @ iam @ GGV2008c Serial 1521  
Permanent link to this record
 

 
Author Enric Marti; Debora Gil; Marc Vivet; Carme Julia edit  openurl
  Title Aprendizaje Basado en Proyectos en la asignatura de Gráficos por Computador en Ingeniería Informática. Balance de cuatro años de experiencia Type Miscellaneous
  Year 2009 Publication 15th Jornadas de Enseñanza Universitaria de la Informatica Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Barcelona, Spain Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume 1 Series Issue Edition  
  ISSN (down) 978-84-692-2758-9 ISBN Medium  
  Area Expedition Conference JENUI  
  Notes IAM;ADAS Approved no  
  Call Number IAM @ iam @ MGV2009 Serial 1596  
Permanent link to this record
 

 
Author Sergio Escalera; Ralf Herbrich edit  url
doi  isbn
openurl 
  Title The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations Type Book Whole
  Year 2020 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor Sergio Escalera; Ralf Hebrick  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2520-1328 ISBN 978-3-030-29134-1 Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ HeE2020 Serial 3328  
Permanent link to this record
 

 
Author Albert Berenguel; Oriol Ramos Terrades; Josep Llados; Cristina Cañero edit  doi
openurl 
  Title Evaluation of Texture Descriptors for Validation of Counterfeit Documents Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 1237-1242  
  Keywords  
  Abstract This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2379-2140 ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.061; 601.269; 600.097; 600.121 Approved no  
  Call Number Admin @ si @ BRL2017 Serial 3092  
Permanent link to this record
 

 
Author M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat edit  doi
openurl 
  Title Implicit Learning of Scene Geometry From Poses for Global Localization Type Journal Article
  Year 2024 Publication IEEE Robotics and Automation Letters Abbreviated Journal ROBOTAUTOMLET  
  Volume 9 Issue 2 Pages 955-962  
  Keywords Localization; Localization and mapping; Deep learning for visual perception; Visual learning  
  Abstract Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2377-3766 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Serial 3857  
Permanent link to this record
 

 
Author G.Thorvaldsen; Joana Maria Pujadas-Mora; T.Andersen ; L.Eikvil; Josep Llados; Alicia Fornes; Anna Cabre edit  url
openurl 
  Title A Tale of two Transcriptions Type Journal
  Year 2015 Publication Historical Life Course Studies Abbreviated Journal  
  Volume 2 Issue Pages 1-19  
  Keywords Nominative Sources; Census; Vital Records; Computer Vision; Optical Character Recognition; Word Spotting  
  Abstract non-indexed
This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2352-6343 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.077; 602.006 Approved no  
  Call Number Admin @ si @ TPA2015 Serial 2582  
Permanent link to this record
 

 
Author Carles Fernandez; Jordi Gonzalez; Joao Manuel R. S. Taveres; Xavier Roca edit   pdf
doi  isbn
openurl 
  Title Towards Ontological Cognitive System Type Book Chapter
  Year 2013 Publication Topics in Medical Image Processing and Computational Vision Abbreviated Journal  
  Volume 8 Issue Pages 87-99  
  Keywords  
  Abstract The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Netherlands Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2212-9391 ISBN 978-94-007-0725-2 Medium  
  Area Expedition Conference  
  Notes ISE; 605.203; 302.018; 600.049 Approved no  
  Call Number Admin @ si @ FGT2013 Serial 2287  
Permanent link to this record
 

 
Author J.Poujol; Cristhian A. Aguilera-Carrasco; E.Danos; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa edit   pdf
url  doi
isbn  openurl
  Title Visible-Thermal Fusion based Monocular Visual Odometry Type Conference Article
  Year 2015 Publication 2nd Iberian Robotics Conference ROBOT2015 Abbreviated Journal  
  Volume 417 Issue Pages 517-528  
  Keywords Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.  
  Abstract The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained
representations are evaluated under a visual odometry framework, highlighting
their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
 
  Address Lisboa; Portugal; November 2015  
  Corporate Author Thesis  
  Publisher Springer International Publishing Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2194-5357 ISBN 978-3-319-27145-3 Medium  
  Area Expedition Conference ROBOT  
  Notes ADAS; 600.076; 600.086 Approved no  
  Call Number Admin @ si @ PAD2015 Serial 2663  
Permanent link to this record
 

 
Author Svebor Karaman; Giuseppe Lisanti; Andrew Bagdanov; Alberto del Bimbo edit  doi
isbn  openurl
  Title From re-identification to identity inference: Labeling consistency by local similarity constraints Type Book Chapter
  Year 2014 Publication Person Re-Identification Abbreviated Journal  
  Volume 2 Issue Pages 287-307  
  Keywords re-identification; Identity inference; Conditional random fields; Video surveillance  
  Abstract In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery.  
  Address  
  Corporate Author Thesis  
  Publisher Springer London Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2191-6586 ISBN 978-1-4471-6295-7 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.079 Approved no  
  Call Number Admin @ si @KLB2014b Serial 2521  
Permanent link to this record
 

 
Author Ariel Amato; Ivan Huerta; Mikhail Mozerov; Xavier Roca; Jordi Gonzalez edit   pdf
doi  isbn
openurl 
  Title Moving Cast Shadows Detection Methods for Video Surveillance Applications Type Book Chapter
  Year 2014 Publication Augmented Vision and Reality Abbreviated Journal  
  Volume 6 Issue Pages 23-47  
  Keywords  
  Abstract Moving cast shadows are a major concern in today’s performance from broad range of many vision-based surveillance applications because they highly difficult the object classification task. Several shadow detection methods have been reported in the literature during the last years. They are mainly divided into two domains. One usually works with static images, whereas the second one uses image sequences, namely video content. In spite of the fact that both cases can be analogously analyzed, there is a difference in the application field. The first case, shadow detection methods can be exploited in order to obtain additional geometric and semantic cues about shape and position of its casting object (‘shape from shadows’) as well as the localization of the light source. While in the second one, the main purpose is usually change detection, scene matching or surveillance (usually in a background subtraction context). Shadows can in fact modify in a negative way the shape and color of the target object and therefore affect the performance of scene analysis and interpretation in many applications. This chapter wills mainly reviews shadow detection methods as well as their taxonomies related with the second case, thus aiming at those shadows which are associated with moving objects (moving shadows).  
  Address  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2190-5916 ISBN 978-3-642-37840-9 Medium  
  Area Expedition Conference  
  Notes ISE; 605.203; 600.049; 302.018; 302.012; 600.078 Approved no  
  Call Number Admin @ si @ AHM2014 Serial 2223  
Permanent link to this record
 

 
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla edit   pdf
url  isbn
openurl 
  Title Human Pose Estimation through a Novel Multi-view Scheme Type Conference Article
  Year 2022 Publication 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) Abbreviated Journal  
  Volume 5 Issue Pages 855-862  
  Keywords Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach  
  Abstract This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a
multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and
comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
 
  Address On line; Feb 6, 2022 – Feb 8, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2184-4321 ISBN 978-989-758-555-5 Medium  
  Area Expedition Conference VISAPP  
  Notes MSIAU; 600.160 Approved no  
  Call Number Admin @ si @ CSV2022 Serial 3689  
Permanent link to this record
 

 
Author Katerine Diaz; Jesus Martinez del Rincon; Aura Hernandez-Sabate; Debora Gil edit   pdf
doi  openurl
  Title Continuous head pose estimation using manifold subspace embedding and multivariate regression Type Journal Article
  Year 2018 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 6 Issue Pages 18325 - 18334  
  Keywords Head Pose estimation; HOG features; Generalized Discriminative Common Vectors; B-splines; Multiple linear regression  
  Abstract In this paper, a continuous head pose estimation system is proposed to estimate yaw and pitch head angles from raw facial images. Our approach is based on manifold learningbased methods, due to their promising generalization properties shown for face modelling from images. The method combines histograms of oriented gradients, generalized discriminative common vectors and continuous local regression to achieve successful performance. Our proposal was tested on multiple standard face datasets, as well as in a realistic scenario. Results show a considerable performance improvement and a higher consistence of our model in comparison with other state-of-art methods, with angular errors varying between 9 and 17 degrees.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2169-3536 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ DMH2018b Serial 3091  
Permanent link to this record
 

 
Author Zeynep Yucel; Albert Ali Salah; Çetin Meriçli; Tekin Meriçli; Roberto Valenti; Theo Gevers edit  doi
openurl 
  Title Joint Attention by Gaze Interpolation and Saliency Type Journal
  Year 2013 Publication IEEE Transactions on cybernetics Abbreviated Journal T-CIBER  
  Volume 43 Issue 3 Pages 829-842  
  Keywords  
  Abstract Joint attention, which is the ability of coordination of a common point of reference with the communicating party, emerges as a key factor in various interaction scenarios. This paper presents an image-based method for establishing joint attention between an experimenter and a robot. The precise analysis of the experimenter's eye region requires stability and high-resolution image acquisition, which is not always available. We investigate regression-based interpolation of the gaze direction from the head pose of the experimenter, which is easier to track. Gaussian process regression and neural networks are contrasted to interpolate the gaze direction. Then, we combine gaze interpolation with image-based saliency to improve the target point estimates and test three different saliency schemes. We demonstrate the proposed method on a human-robot interaction scenario. Cross-subject evaluations, as well as experiments under adverse conditions (such as dimmed or artificial illumination or motion blur), show that our method generalizes well and achieves rapid gaze estimation for establishing joint attention.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2168-2267 ISBN Medium  
  Area Expedition Conference  
  Notes ALTRES;ISE Approved no  
  Call Number Admin @ si @ YSM2013 Serial 2363  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: