|   | 
Details
   web
Records
Author Pau Riba; Josep Llados; Alicia Fornes; Anjan Dutta
Title Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases Type Journal Article
Year 2017 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 87 Issue Pages 203-211
Keywords
Abstract Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.097; 602.006; 603.053; 600.121 Approved no
Call Number (down) RLF2017b Serial 2873
Permanent link to this record
 

 
Author Christophe Rigaud; Clement Guerin; Dimosthenis Karatzas; Jean-Christophe Burie; Jean-Marc Ogier
Title Knowledge-driven understanding of images in comic books Type Journal Article
Year 2015 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 18 Issue 3 Pages 199-221
Keywords Document Understanding; comics analysis; expert system
Abstract Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1433-2833 ISBN Medium
Area Expedition Conference
Notes DAG; 600.056; 600.077 Approved no
Call Number (down) RGK2015 Serial 2595
Permanent link to this record
 

 
Author A. Pujol; Juan J. Villanueva
Title A supervised Modification of the Hausdorff distance for visual shape classification Type Journal
Year 2002 Publication International Journal of Pattern Recognition and Artificial Intelligence Abbreviated Journal
Volume 16 Issue 3 Pages 349-359
Keywords
Abstract (IF: 0.359)
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number (down) PuV2002 Serial 273
Permanent link to this record
 

 
Author Victor Ponce
Title Evolutionary Bags of Space-Time Features for Human Analysis Type Book Whole
Year 2016 Publication PhD Thesis Universitat de Barcelona, UOC and CVC Abbreviated Journal
Volume Issue Pages
Keywords Computer algorithms; Digital image processing; Digital video; Analysis of variance; Dynamic programming; Evolutionary computation; Gesture
Abstract The representation (or feature) learning has been an emerging concept in the last years, since it collects a set of techniques that are present in any theoretical or practical methodology referring to artificial intelligence. In computer vision, a very common representation has adopted the form of the well-known Bag of Visual Words. This representation appears implicitly in most approaches where images are described, and is also present in a huge number of areas and domains: image content retrieval, pedestrian detection, human-computer interaction, surveillance, e-health, and social computing, amongst others. The early stages of this dissertation provide an approach for learning visual representations inside evolutionary algorithms, which consists of evolving weighting schemes to improve the BoVW representations for the task of recognizing categories of videos and images. Thus, we demonstrate the applicability of the most common weighting schemes, which are often used in text mining but are less frequently found in computer vision tasks. Beyond learning these visual representations, we provide an approach based on fusion strategies for learning spatiotemporal representations, from multimodal data obtained by depth sensors. Besides, we specially aim at the evolutionary and dynamic modelling, where the temporal factor is present in the nature of the data, such as video sequences of gestures and actions. Indeed, we explore the effects of probabilistic modelling for those approaches based on dynamic programming, so as to handle the temporal deformation and variance amongst video sequences of different categories. Finally, we integrate dynamic programming and generative models into an evolutionary computation framework, with the aim of learning Bags of SubGestures (BoSG) representations and hence to improve the generalization capability of standard gesture recognition approaches. The results obtained in the experimentation demonstrate, first, that evolutionary algorithms are useful for improving the representation of BoVW approaches in several datasets for recognizing categories in still images and video sequences. On the other hand, our experimentation reveals that both, the use of dynamic programming and generative models to align video sequences, and the representations obtained from applying fusion strategies in multimodal data, entail an enhancement on the performance when recognizing some gesture categories. Furthermore, the combination of evolutionary algorithms with models based on dynamic programming and generative approaches results, when aiming at the classification of video categories on large video datasets, in a considerable improvement over standard gesture and action recognition approaches. Finally, we demonstrate the applications of these representations in several domains for human analysis: classification of images where humans may be present, action and gesture recognition for general applications, and in particular for conversational settings within the field of restorative justice
Address June 2016
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Xavier Baro;Hugo Jair Escalante
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA Approved no
Call Number (down) Pon2016 Serial 2814
Permanent link to this record
 

 
Author Marco Pedersoli; Jordi Gonzalez; Xu Hu; Xavier Roca
Title Toward Real-Time Pedestrian Detection Based on a Deformable Template Model Type Journal Article
Year 2014 Publication IEEE Transactions on Intelligent Transportation Systems Abbreviated Journal TITS
Volume 15 Issue 1 Pages 355-364
Keywords
Abstract Most advanced driving assistance systems already include pedestrian detection systems. Unfortunately, there is still a tradeoff between precision and real time. For a reliable detection, excellent precision-recall such a tradeoff is needed to detect as many pedestrians as possible while, at the same time, avoiding too many false alarms; in addition, a very fast computation is needed for fast reactions to dangerous situations. Recently, novel approaches based on deformable templates have been proposed since these show a reasonable detection performance although they are computationally too expensive for real-time performance. In this paper, we present a system for pedestrian detection based on a hierarchical multiresolution part-based model. The proposed system is able to achieve state-of-the-art detection accuracy due to the local deformations of the parts while exhibiting a speedup of more than one order of magnitude due to a fast coarse-to-fine inference technique. Moreover, our system explicitly infers the level of resolution available so that the detection of small examples is feasible with a very reduced computational cost. We conclude this contribution by presenting how a graphics processing unit-optimized implementation of our proposed system is suitable for real-time pedestrian detection in terms of both accuracy and speed.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1524-9050 ISBN Medium
Area Expedition Conference
Notes ISE; 601.213; 600.078 Approved no
Call Number (down) PGH2014 Serial 2350
Permanent link to this record
 

 
Author C. Alejandro Parraga
Title Colours and Colour Vision: An Introductory Survey Type Journal Article
Year 2017 Publication Perception Abbreviated Journal PER
Volume 46 Issue 5 Pages 640-641
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT; no menciona Approved no
Call Number (down) Par2017 Serial 3101
Permanent link to this record
 

 
Author Francisco Javier Orozco; Ognjen Rudovic; Jordi Gonzalez; Maja Pantic
Title Hierarchical On-line Appearance-Based Tracking for 3D Head Pose, Eyebrows, Lips, Eyelids and Irises Type Journal Article
Year 2013 Publication Image and Vision Computing Abbreviated Journal IMAVIS
Volume 31 Issue 4 Pages 322-340
Keywords On-line appearance models; Levenberg–Marquardt algorithm; Line-search optimization; 3D face tracking; Facial action tracking; Eyelid tracking; Iris tracking
Abstract In this paper, we propose an On-line Appearance-Based Tracker (OABT) for simultaneous tracking of 3D head pose, lips, eyebrows, eyelids and irises in monocular video sequences. In contrast to previously proposed tracking approaches, which deal with face and gaze tracking separately, our OABT can also be used for eyelid and iris tracking, as well as 3D head pose, lips and eyebrows facial actions tracking. Furthermore, our approach applies an on-line learning of changes in the appearance of the tracked target. Hence, the prior training of appearance models, which usually requires a large amount of labeled facial images, is avoided. Moreover, the proposed method is built upon a hierarchical combination of three OABTs, which are optimized using a Levenberg–Marquardt Algorithm (LMA) enhanced with line-search procedures. This, in turn, makes the proposed method robust to changes in lighting conditions, occlusions and translucent textures, as evidenced by our experiments. Finally, the proposed method achieves head and facial actions tracking in real-time.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 605.203; 302.012; 302.018; 600.049 Approved no
Call Number (down) ORG2013 Serial 2221
Permanent link to this record
 

 
Author Naveen Onkarappa; Angel Sappa
Title Space Variant Representations for Mobile Platform Vision Applications Type Conference Article
Year 2011 Publication 14th International Conference on Computer Analysis of Images and Patterns Abbreviated Journal
Volume 6855 Issue II Pages 146-154
Keywords
Abstract The log-polar space variant representation, motivated by biological vision, has been widely studied in the literature. Its data reduction and invariance properties made it useful in many vision applications. However, due to its nature, it fails in preserving features in the periphery. In the current work, as an attempt to overcome this problem, we propose a novel space-variant representation. It is evaluated and proved to be better than the log-polar representation in preserving the peripheral information, crucial for on-board mobile vision applications. The evaluation is performed by comparing log-polar and the proposed representation once they are used for estimating dense optical flow.
Address Seville, Spain
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor P. Real, D. Diaz, H. Molina, A. Berciano, W. Kropatsch
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-23677-8 Medium
Area Expedition Conference CAIP
Notes ADAS Approved no
Call Number (down) NaS2011; ADAS @ adas @ Serial 1686
Permanent link to this record
 

 
Author Joan Mas; Gemma Sanchez; Josep Llados
Title SSP: Sketching slide Presentations, a Syntactic Approach Type Book Chapter
Year 2010 Publication Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers Abbreviated Journal
Volume 6020 Issue Pages 118-129
Keywords
Abstract The design of a slide presentation is a creative process. In this process first, humans visualize in their minds what they want to explain. Then, they have to be able to represent this knowledge in an understandable way. There exists a lot of commercial software that allows to create our own slide presentations but the creativity of the user is rather limited. In this article we present an application that allows the user to create and visualize a slide presentation from a sketch. A slide may be seen as a graphical document or a diagram where its elements are placed in a particular spatial arrangement. To describe and recognize slides a syntactic approach is proposed. This approach is based on an Adjacency Grammar and a parsing methodology to cope with this kind of grammars. The experimental evaluation shows the performance of our methodology from a qualitative and a quantitative point of view. Six different slides containing different number of symbols, from 4 to 7, have been given to the users and they have drawn them without restrictions in the order of the elements. The quantitative results give an idea on how suitable is our methodology to describe and recognize the different elements in a slide.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-13727-3 Medium
Area Expedition Conference GREC
Notes DAG Approved no
Call Number (down) MSL2010 Serial 2405
Permanent link to this record
 

 
Author Saad Minhas; Aura Hernandez-Sabate; Shoaib Ehsan; Katerine Diaz; Ales Leonardis; Antonio Lopez; Klaus McDonald Maier
Title LEE: A photorealistic Virtual Environment for Assessing Driver-Vehicle Interactions in Self-Driving Mode Type Conference Article
Year 2016 Publication 14th European Conference on Computer Vision Workshops Abbreviated Journal
Volume 9915 Issue Pages 894-900
Keywords Simulation environment; Automated Driving; Driver-Vehicle interaction
Abstract Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Address Amsterdam; The Netherlands; October 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCVW
Notes ADAS;IAM; 600.085; 600.076 Approved no
Call Number (down) MHE2016 Serial 2865
Permanent link to this record
 

 
Author Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar
Title InfographicVQA Type Conference Article
Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 1697-1706
Keywords Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages
Abstract Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org
Address Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.155 Approved no
Call Number (down) MBT2022 Serial 3625
Permanent link to this record
 

 
Author Hector Laria Mantecon; Yaxing Wang; Joost Van de Weijer; Bogdan Raducanu
Title Transferring Unconditional to Conditional GANs With Hyper-Modulation Type Conference Article
Year 2022 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks.
Address New Orleans; USA; June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes LAMP; 600.147; 602.200 Approved no
Call Number (down) LWW2022a Serial 3785
Permanent link to this record
 

 
Author Xialei Liu; Joost Van de Weijer; Andrew Bagdanov
Title Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank Type Journal Article
Year 2019 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 41 Issue 8 Pages 1862-1878
Keywords Task analysis;Training;Image quality;Visualization;Uncertainty;Labeling;Neural networks;Learning from rankings;image quality assessment;crowd counting;active learning
Abstract For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.109; 600.106; 600.120 Approved no
Call Number (down) LWB2019 Serial 3267
Permanent link to this record
 

 
Author Dimosthenis Karatzas; V. Poulain d'Andecy; Marçal Rusiñol
Title Human-Document Interaction – a new frontier for document image analysis Type Conference Article
Year 2016 Publication 12th IAPR Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 369-374
Keywords
Abstract All indications show that paper documents will not cede in favour of their digital counterparts, but will instead be used increasingly in conjunction with digital information. An open challenge is how to seamlessly link the physical with the digital – how to continue taking advantage of the important affordances of paper, without missing out on digital functionality. This paper
presents the authors’ experience with developing systems for Human-Document Interaction based on augmented document interfaces and examines new challenges and opportunities arising for the document image analysis field in this area. The system presented combines state of the art camera-based document
image analysis techniques with a range of complementary tech-nologies to offer fluid Human-Document Interaction. Both fixed and nomadic setups are discussed that have gone through user testing in real-life environments, and use cases are presented that span the spectrum from business to educational application
Address Santorini; Greece; April 2016
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.084; 600.077 Approved no
Call Number (down) KPR2016 Serial 2756
Permanent link to this record
 

 
Author Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol; Anguelos Nicolaou
Title The Robust Reading Competition Annotation and Evaluation Platform Type Conference Article
Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 61-66
Keywords
Abstract The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous
effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the
Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
Address Viena; Austria; April 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.084; 600.121 Approved no
Call Number (down) KGR2018 Serial 3103
Permanent link to this record