|   | 
Details
   web
Records
Author Ruben Perez Tito
Title (up) Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents Type Book Whole
Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely
ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents.
In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been
primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of
different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose
a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure.
Address
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Ernest Valveny
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-124793-5-5 Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ Per2023 Serial 3967
Permanent link to this record
 

 
Author Maria Vanrell
Title (up) Exploring the space of behaviour of a texture perception algorithm Type Report
Year 1997 Publication CVC Technical Report #12 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC Approved no
Call Number CAT @ cat @ Van1997 Serial 523
Permanent link to this record
 

 
Author Debora Gil; Petia Radeva
Title (up) Extending anisotropic operators to recover smooth shapes Type Journal Article
Year 2005 Publication Computer Vision and Image Understanding Abbreviated Journal
Volume 99 Issue 1 Pages 110-125
Keywords Contour completion; Functional extension; Differential operators; Riemmanian manifolds; Snake segmentation
Abstract Anisotropic differential operators are widely used in image enhancement processes. Recently, their property of smoothly extending functions to the whole image domain has begun to be exploited. Strong ellipticity of differential operators is a requirement that ensures existence of a unique solution. This condition is too restrictive for operators designed to extend image level sets: their own functionality implies that they should restrict to some vector field. The diffusion tensor that defines the diffusion operator links anisotropic processes with Riemmanian manifolds. In this context, degeneracy implies restricting diffusion to the varieties generated by the vector fields of positive eigenvalues, provided that an integrability condition is satisfied. We will use that any smooth vector field fulfills this integrability requirement to design line connection algorithms for contour completion. As application we present a segmenting strategy that assures convergent snakes whatever the geometry of the object to be modelled is.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1077-3142 ISBN Medium
Area Expedition Conference
Notes IAM;MILAB Approved no
Call Number IAM @ iam @ GIR2005 Serial 1530
Permanent link to this record
 

 
Author Katerine Diaz; Francesc J. Ferri
Title (up) Extensiones del método de vectores comunes discriminantes Aplicadas a la clasificación de imágenes Type Book Whole
Year 2013 Publication Extensiones del método de vectores comunes discriminantes Aplicadas a la clasificación de imágenes Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Los métodos basados en subespacios son una herramienta muy utilizada en aplicaciones de visión por computador. Aquí se presentan y validan algunos algoritmos que hemos propuesto en este campo de investigación. El primer algoritmo está relacionado con una extensión del método de vectores comunes discriminantes con kernel, que reinterpreta el espacio nulo de la matriz de dispersión intra-clase del conjunto de entrenamiento para obtener las características discriminantes. Dentro de los métodos basados en subespacios existen diferentes tipos de entrenamiento. Uno de los más populares, pero no por ello uno de los más eficientes, es el aprendizaje por lotes. En este tipo de aprendizaje, todas las muestras del conjunto de entrenamiento tienen que estar disponibles desde el inicio. De este modo, cuando nuevas muestras se ponen a disposición del algoritmo, el sistema tiene que ser reentrenado de nuevo desde cero. Una alternativa a este tipo de entrenamiento es el aprendizaje incremental. Aquí­ se proponen diferentes algoritmos incrementales del método de vectores comunes discriminantes.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-3-639-55339-0 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ DiF2013 Serial 2440
Permanent link to this record
 

 
Author G. Tortajada
Title (up) Extraccio interactiva de patrocinadors d’anuncis en revistes utilitzant tecniques de Visio per Computador Type Report
Year 2000 Publication CVC Technical Report #42 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Tor2000b Serial 343
Permanent link to this record
 

 
Author J.R. Serra; J.B. Subirana
Title (up) Extraccion de estructuras interesantes en imagenes Type Report
Year 1996 Publication CVC Tecnical Report #14 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ SeS1996c Serial 216
Permanent link to this record
 

 
Author Herve Locteau; Sebastien Mace; Ernest Valveny; Salvatore Tabbone
Title (up) Extraction des pieces de un plan de habitation Type Conference Article
Year 2010 Publication Colloque Internacional Francophone de l´Ecrit et le Document Abbreviated Journal
Volume Issue Pages 1–12
Keywords
Abstract In this article, a method to extract the rooms of an architectural floor plan image is described. We first present a line detection algorithm to extract long lines in the image. Those lines are analyzed to identify the existing walls. From this point, room extraction can be seen as a classical segmentation task for which each region corresponds to a room. The chosen resolution strategy consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines can also be rough. Thus, we take advantage of knowledge associated to architectural floor plans in order to obtain mainly rectangular rooms. Preliminary tests on a set of real documents show promising results.
Address Sousse, Tunisia
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CIFED
Notes DAG Approved no
Call Number DAG @ dag @ LMV2010 Serial 1440
Permanent link to this record
 

 
Author Antonio Lopez; Ricardo Toledo; Joan Serrat; Juan J. Villanueva
Title (up) Extraction of vessel centerlines from 2D coronary angiographies Type Miscellaneous
Year 1999 Publication Proceedings of the VIII Symposium Nacional de Reconocimiento de Formas y Analisis de Imagenes. pgs. 489–496, volume I Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Bilbao
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number ADAS @ adas @ LTS1999 Serial 14
Permanent link to this record
 

 
Author Fernando Vilariño; Stephan Ameling; Gerard Lacey; Stephen Patchett; Hugh Mulcahy
Title (up) Eye Tracking Search Patterns in Expert and Trainee Colonoscopists: A Novel Method of Assessing Endoscopic Competency? Type Journal Article
Year 2009 Publication Gastrointestinal Endoscopy Abbreviated Journal GI
Volume 69 Issue 5 Pages 370
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area 800 Expedition Conference
Notes MV;SIAI Approved no
Call Number fernando @ fernando @ Serial 2420
Permanent link to this record
 

 
Author Helena Muñoz; Fernando Vilariño; Dimosthenis Karatzas
Title (up) Eye-Movements During Information Extraction from Administrative Documents Type Conference Article
Year 2019 Publication International Conference on Document Analysis and Recognition Workshops Abbreviated Journal
Volume Issue Pages 6-9
Keywords
Abstract A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information.
Address Sydney; Australia; September 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDARW
Notes DAG; 600.140; 600.121; 600.129;SIAI Approved no
Call Number Admin @ si @ MVK2019 Serial 3336
Permanent link to this record
 

 
Author Onur Ferhat
Title (up) Eye-Tracking with Webcam-Based Setups: Implementation of a Real-Time System and an Analysis of Factors Affecting Performance Type Report
Year 2012 Publication CVC Technical Report Abbreviated Journal
Volume 172 Issue Pages
Keywords Computer vision, eye-tracking, gaussian process, feature selection, optical flow
Abstract In the recent years commercial eye-tracking hardware has become more common, with the introduction of new models from several brands that have better performance and easier setup procedures. A cause and at the same time a result of this phenomenon is the popularity of eye-tracking research directed at marketing, accessibility and usability, among others.
One problem with these hardware components is scalability, because both the price and the necessary expertise to operate them makes it practically impossible in the large scale. In this work, we analyze the feasibility of a software eye-tracking system based on a single, ordinary webcam. Our aim is to discover the limits of such a system and to see whether it provides acceptable performances.
The significance of this setup is that it is the most common setup found in consumer environments, off-the-shelf electronic devices such as laptops, mobile phones and tablet computers. As no special equipment such as infrared lights, mirrors or zoom lenses are used; setting up and calibrating the system is easier compared to other approaches using these components.
Our work is based on the open source application Opengazer, which provides a good starting point for our contributions. We propose several improvements in order to push the system's performance further and make it feasible as a robust, real-time device. Then we carry out an elaborate experiment involving 18 human subjects and 4 different system setups. Finally, we give an analysis of the results and discuss the effects of setup changes, subject differences and modifications in the software.
Address Bellaterra
Corporate Author Computer Vision Center Thesis Master's thesis
Publisher Place of Publication Editor Fernando Vilariño
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MV Approved no
Call Number Admin @ si @ Fer2012; IAM @ iam @ Fer2012 Serial 2165
Permanent link to this record
 

 
Author Francisco Javier Orozco; Pau Baiget; Jordi Gonzalez; Xavier Roca
Title (up) Eyelids and Face Tracking in Real-Time Type Miscellaneous
Year 2006 Publication 6th IASTED International Conference Visualization, Imaging, and Image Processing (VIIP´06), 165–170 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Palma de Mallorca (Spain)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number ISE @ ise @ OBG2006 Serial 730
Permanent link to this record
 

 
Author Aura Hernandez-Sabate; Lluis Albarracin; Daniel Calvo; Nuria Gorgorio
Title (up) EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game Type Conference Article
Year 2016 Publication 5th International Conference Games and Learning Alliance Abbreviated Journal
Volume 10056 Issue Pages 50-59
Keywords Simulation environment; Automated Driving; Driver-Vehicle interaction
Abstract Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GALA
Notes ADAS;IAM; Approved no
Call Number HAC2016 Serial 2864
Permanent link to this record
 

 
Author Zhong Jin; Zhen Lou; Jing-Yu Yang; Quan-sen Sun
Title (up) Face Detection using Template Matching and Skin-color Information Type Journal
Year 2007 Publication Neurocomputing, 70(4–6): 794–800 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ JLY2007 Serial 878
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title (up) Face Anti-spoofing Progress Driven by Academic Challenges Type Book Chapter
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages 1–15
Keywords
Abstract With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that prevent the recording and distribution of facial images, the little attention that the community has given to potential ethnic biases among others. This chapter provides an overview of contributions derived from the organization of academic challenges in the context of face anti-spoofing detection. Specifically, we discuss the limitations of benchmarks and summarize our efforts in trying to boost research by the community via the participation in academic challenges
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023c Serial 3957
Permanent link to this record