Home | [81–90] << 91 92 93 94 95 96 97 98 99 100 >> [101–110] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Ruben Perez Tito | ||||
Title ![]() |
Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents | Type | Book Whole | ||
Year | 2023 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely
ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents. In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | IMPRIMA | Place of Publication | Editor | Ernest Valveny | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-124793-5-5 | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ Per2023 | Serial | 3967 | ||
Permanent link to this record | |||||
Author | Maria Vanrell | ||||
Title ![]() |
Exploring the space of behaviour of a texture perception algorithm | Type | Report | ||
Year | 1997 | Publication | CVC Technical Report #12 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | CAT @ cat @ Van1997 | Serial | 523 | ||
Permanent link to this record | |||||
Author | Debora Gil; Petia Radeva | ||||
Title ![]() |
Extending anisotropic operators to recover smooth shapes | Type | Journal Article | ||
Year | 2005 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | |
Volume | 99 | Issue | 1 | Pages | 110-125 |
Keywords | Contour completion; Functional extension; Differential operators; Riemmanian manifolds; Snake segmentation | ||||
Abstract | Anisotropic differential operators are widely used in image enhancement processes. Recently, their property of smoothly extending functions to the whole image domain has begun to be exploited. Strong ellipticity of differential operators is a requirement that ensures existence of a unique solution. This condition is too restrictive for operators designed to extend image level sets: their own functionality implies that they should restrict to some vector field. The diffusion tensor that defines the diffusion operator links anisotropic processes with Riemmanian manifolds. In this context, degeneracy implies restricting diffusion to the varieties generated by the vector fields of positive eigenvalues, provided that an integrability condition is satisfied. We will use that any smooth vector field fulfills this integrability requirement to design line connection algorithms for contour completion. As application we present a segmenting strategy that assures convergent snakes whatever the geometry of the object to be modelled is. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | IAM;MILAB | Approved | no | ||
Call Number | IAM @ iam @ GIR2005 | Serial | 1530 | ||
Permanent link to this record | |||||
Author | Katerine Diaz; Francesc J. Ferri | ||||
Title ![]() |
Extensiones del método de vectores comunes discriminantes Aplicadas a la clasificación de imágenes | Type | Book Whole | ||
Year | 2013 | Publication | Extensiones del método de vectores comunes discriminantes Aplicadas a la clasificación de imágenes | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Los métodos basados en subespacios son una herramienta muy utilizada en aplicaciones de visión por computador. Aquí se presentan y validan algunos algoritmos que hemos propuesto en este campo de investigación. El primer algoritmo está relacionado con una extensión del método de vectores comunes discriminantes con kernel, que reinterpreta el espacio nulo de la matriz de dispersión intra-clase del conjunto de entrenamiento para obtener las características discriminantes. Dentro de los métodos basados en subespacios existen diferentes tipos de entrenamiento. Uno de los más populares, pero no por ello uno de los más eficientes, es el aprendizaje por lotes. En este tipo de aprendizaje, todas las muestras del conjunto de entrenamiento tienen que estar disponibles desde el inicio. De este modo, cuando nuevas muestras se ponen a disposición del algoritmo, el sistema tiene que ser reentrenado de nuevo desde cero. Una alternativa a este tipo de entrenamiento es el aprendizaje incremental. Aquí se proponen diferentes algoritmos incrementales del método de vectores comunes discriminantes. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-639-55339-0 | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | Admin @ si @ DiF2013 | Serial | 2440 | ||
Permanent link to this record | |||||
Author | G. Tortajada | ||||
Title ![]() |
Extraccio interactiva de patrocinadors d’anuncis en revistes utilitzant tecniques de Visio per Computador | Type | Report | ||
Year | 2000 | Publication | CVC Technical Report #42 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ Tor2000b | Serial | 343 | ||
Permanent link to this record | |||||
Author | J.R. Serra; J.B. Subirana | ||||
Title ![]() |
Extraccion de estructuras interesantes en imagenes | Type | Report | ||
Year | 1996 | Publication | CVC Tecnical Report #14 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ SeS1996c | Serial | 216 | ||
Permanent link to this record | |||||
Author | Herve Locteau; Sebastien Mace; Ernest Valveny; Salvatore Tabbone | ||||
Title ![]() |
Extraction des pieces de un plan de habitation | Type | Conference Article | ||
Year | 2010 | Publication | Colloque Internacional Francophone de l´Ecrit et le Document | Abbreviated Journal | |
Volume | Issue | Pages | 1–12 | ||
Keywords | |||||
Abstract | In this article, a method to extract the rooms of an architectural floor plan image is described. We first present a line detection algorithm to extract long lines in the image. Those lines are analyzed to identify the existing walls. From this point, room extraction can be seen as a classical segmentation task for which each region corresponds to a room. The chosen resolution strategy consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines can also be rough. Thus, we take advantage of knowledge associated to architectural floor plans in order to obtain mainly rectangular rooms. Preliminary tests on a set of real documents show promising results. | ||||
Address | Sousse, Tunisia | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CIFED | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ LMV2010 | Serial | 1440 | ||
Permanent link to this record | |||||
Author | Antonio Lopez; Ricardo Toledo; Joan Serrat; Juan J. Villanueva | ||||
Title ![]() |
Extraction of vessel centerlines from 2D coronary angiographies | Type | Miscellaneous | ||
Year | 1999 | Publication | Proceedings of the VIII Symposium Nacional de Reconocimiento de Formas y Analisis de Imagenes. pgs. 489–496, volume I | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Bilbao | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | ADAS @ adas @ LTS1999 | Serial | 14 | ||
Permanent link to this record | |||||
Author | Fernando Vilariño; Stephan Ameling; Gerard Lacey; Stephen Patchett; Hugh Mulcahy | ||||
Title ![]() |
Eye Tracking Search Patterns in Expert and Trainee Colonoscopists: A Novel Method of Assessing Endoscopic Competency? | Type | Journal Article | ||
Year | 2009 | Publication | Gastrointestinal Endoscopy | Abbreviated Journal | GI |
Volume | 69 | Issue | 5 | Pages | 370 |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | 800 | Expedition | Conference | ||
Notes | MV;SIAI | Approved | no | ||
Call Number | fernando @ fernando @ | Serial | 2420 | ||
Permanent link to this record | |||||
Author | Helena Muñoz; Fernando Vilariño; Dimosthenis Karatzas | ||||
Title ![]() |
Eye-Movements During Information Extraction from Administrative Documents | Type | Conference Article | ||
Year | 2019 | Publication | International Conference on Document Analysis and Recognition Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 6-9 | ||
Keywords | |||||
Abstract | A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information. | ||||
Address | Sydney; Australia; September 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDARW | ||
Notes | DAG; 600.140; 600.121; 600.129;SIAI | Approved | no | ||
Call Number | Admin @ si @ MVK2019 | Serial | 3336 | ||
Permanent link to this record | |||||
Author | Onur Ferhat | ||||
Title ![]() |
Eye-Tracking with Webcam-Based Setups: Implementation of a Real-Time System and an Analysis of Factors Affecting Performance | Type | Report | ||
Year | 2012 | Publication | CVC Technical Report | Abbreviated Journal | |
Volume | 172 | Issue | Pages | ||
Keywords | Computer vision, eye-tracking, gaussian process, feature selection, optical flow | ||||
Abstract | In the recent years commercial eye-tracking hardware has become more common, with the introduction of new models from several brands that have better performance and easier setup procedures. A cause and at the same time a result of this phenomenon is the popularity of eye-tracking research directed at marketing, accessibility and usability, among others.
One problem with these hardware components is scalability, because both the price and the necessary expertise to operate them makes it practically impossible in the large scale. In this work, we analyze the feasibility of a software eye-tracking system based on a single, ordinary webcam. Our aim is to discover the limits of such a system and to see whether it provides acceptable performances. The significance of this setup is that it is the most common setup found in consumer environments, off-the-shelf electronic devices such as laptops, mobile phones and tablet computers. As no special equipment such as infrared lights, mirrors or zoom lenses are used; setting up and calibrating the system is easier compared to other approaches using these components. Our work is based on the open source application Opengazer, which provides a good starting point for our contributions. We propose several improvements in order to push the system's performance further and make it feasible as a robust, real-time device. Then we carry out an elaborate experiment involving 18 human subjects and 4 different system setups. Finally, we give an analysis of the results and discuss the effects of setup changes, subject differences and modifications in the software. |
||||
Address | Bellaterra | ||||
Corporate Author | Computer Vision Center | Thesis | Master's thesis | ||
Publisher | Place of Publication | Editor | Fernando Vilariño | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MV | Approved | no | ||
Call Number | Admin @ si @ Fer2012; IAM @ iam @ Fer2012 | Serial | 2165 | ||
Permanent link to this record | |||||
Author | Francisco Javier Orozco; Pau Baiget; Jordi Gonzalez; Xavier Roca | ||||
Title ![]() |
Eyelids and Face Tracking in Real-Time | Type | Miscellaneous | ||
Year | 2006 | Publication | 6th IASTED International Conference Visualization, Imaging, and Image Processing (VIIP´06), 165–170 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Palma de Mallorca (Spain) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | ISE @ ise @ OBG2006 | Serial | 730 | ||
Permanent link to this record | |||||
Author | Aura Hernandez-Sabate; Lluis Albarracin; Daniel Calvo; Nuria Gorgorio | ||||
Title ![]() |
EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game | Type | Conference Article | ||
Year | 2016 | Publication | 5th International Conference Games and Learning Alliance | Abbreviated Journal | |
Volume | 10056 | Issue | Pages | 50-59 | |
Keywords | Simulation environment; Automated Driving; Driver-Vehicle interaction | ||||
Abstract | Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GALA | ||
Notes | ADAS;IAM; | Approved | no | ||
Call Number | HAC2016 | Serial | 2864 | ||
Permanent link to this record | |||||
Author | Zhong Jin; Zhen Lou; Jing-Yu Yang; Quan-sen Sun | ||||
Title ![]() |
Face Detection using Template Matching and Skin-color Information | Type | Journal | ||
Year | 2007 | Publication | Neurocomputing, 70(4–6): 794–800 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ JLY2007 | Serial | 878 | ||
Permanent link to this record | |||||
Author | Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li | ||||
Title ![]() |
Face Anti-spoofing Progress Driven by Academic Challenges | Type | Book Chapter | ||
Year | 2023 | Publication | Advances in Face Presentation Attack Detection | Abbreviated Journal | |
Volume | Issue | Pages | 1–15 | ||
Keywords | |||||
Abstract | With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that prevent the recording and distribution of facial images, the little attention that the community has given to potential ethnic biases among others. This chapter provides an overview of contributions derived from the organization of academic challenges in the context of face anti-spoofing detection. Specifically, we discuss the limitations of benchmarks and summarize our efforts in trying to boost research by the community via the participation in academic challenges | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ WGE2023c | Serial | 3957 | ||
Permanent link to this record |