|   | 
Details
   web
Records
Author Albert Gordo; Ernest Valveny
Title A rotation invariant page layout descriptor for document classification and retrieval Type Conference Article
Year 2009 Publication 10th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 481–485
Keywords
Abstract Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
Address Barcelona, Spain
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1520-5363 ISBN 978-1-4244-4500-4 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number DAG @ dag @ GoV2009a Serial 1175
Permanent link to this record
 

 
Author Alicia Fornes; Josep Llados; Gemma Sanchez; Horst Bunke
Title On the use of textural features for writer identification in old handwritten music scores Type Conference Article
Year 2009 Publication 10th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 996 - 1000
Keywords
Abstract Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
Address Barcelona
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1520-5363 ISBN 978-1-4244-4500-4 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number DAG @ dag @ FLS2009b Serial 1223
Permanent link to this record
 

 
Author D. Perez; L. Tarazon; N. Serrano; F.M. Castro; Oriol Ramos Terrades; A. Juan
Title The GERMANA Database Type Conference Article
Year 2009 Publication 10th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 301-305
Keywords
Abstract A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
Address Barcelona; Spain
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1520-5363 ISBN 978-1-4244-4500-4 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ PTS2009 Serial 1870
Permanent link to this record
 

 
Author Ernest Valveny; Enric Marti
Title Learning of structural descriptions of graphic symbols using deformable template matching Type Conference Article
Year 2001 Publication Proc. Sixth Int Document Analysis and Recognition Conf Abbreviated Journal
Volume Issue Pages 455-459
Keywords
Abstract Accurate symbol recognition in graphic documents needs an accurate representation of the symbols to be recognized. If structural approaches are used for recognition, symbols have to be described in terms of their shape, using structural relationships among extracted features. Unlike statistical pattern recognition, in structural methods, symbols are usually manually defined from expertise knowledge, and not automatically infered from sample images. In this work we explain one approach to learn from examples a representative structural description of a symbol, thus providing better information about shape variability. The description of a symbol is based on a probabilistic model. It consists of a set of lines described by the mean and the variance of line parameters, respectively providing information about the model of the symbol, and its shape variability. The representation of each image in the sample set as a set of lines is achieved using deformable template matching.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG;IAM; Approved no
Call Number IAM @ iam @ VMA2001 Serial 1654
Permanent link to this record
 

 
Author Josep Llados; Enric Marti; Jaime Lopez-Krahe
Title A Hough-based method for hatched pattern detection in maps and diagrams Type Conference Article
Year 1999 Publication Proceeding of the Fifth Int. Conf. Document Analysis and Recognition ICDAR ’99 Abbreviated Journal
Volume Issue Pages 479-482
Keywords
Abstract A hatched area is characterized by a set of parallel straight lines placed at regular intervals. In this paper, a Hough-based schema is introduced to recognize hatched areas in technical documents from attributed graph structures representing the document once it has been vectorized. Defining a Hough-based transform from a graph instead of the raster image allows to drastically reduce the processing time and, second, to obtain more reliable results because straight lines have already been detected in the vectorization step. A second advantage of the proposed method is that no assumptions must be made a priori about the slope and frequency of hatching patterns, but they are computed in run time for each hatched area.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG;IAM; Approved no
Call Number IAM @ iam @ LIM1999b Serial 1580
Permanent link to this record
 

 
Author David Curto; Albert Clapes; Javier Selva; Sorina Smeureanu; Julio C. S. Jacques Junior; David Gallardo-Pujol; Georgina Guilera; David Leiva; Thomas B. Moeslund; Sergio Escalera; Cristina Palmero
Title Dyadformer: A Multi-Modal Transformer for Long-Range Modeling of Dyadic Interactions Type Conference Article
Year 2021 Publication IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 2177-2188
Keywords
Abstract Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset.
Address Virtual; October 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ CCS2021 Serial 3648
Permanent link to this record
 

 
Author Claudia Greco; Carmela Buono; Pau Buch-Cardona; Gennaro Cordasco; Sergio Escalera; Anna Esposito; Anais Fernandez; Daria Kyslitska; Maria Stylianou Kornes; Cristina Palmero; Jofre Tenorio Laranga; Anna Torp Johansen; Maria Ines Torres
Title Emotional Features of Interactions With Empathic Agents Type Conference Article
Year 2021 Publication IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 2168-2176
Keywords
Abstract The current study is part of the EMPATHIC project, whose aim is to develop an Empathic Virtual Coach (VC) capable of promoting healthy and independent aging. To this end, the VC needs to be capable of perceiving the emotional states of users and adjusting its behaviour during the interactions according to what the users are experiencing in terms of emotions and comfort. Thus, the present work focuses on some sessions where elderly users of three different countries interact with a simulated system. Audio and video information extracted from these sessions were examined by external observers to assess participants' emotional experience with the EMPATHIC-VC in terms of categorical and dimensional assessment of emotions. Analyses were conducted on the emotional labels assigned by the external observers while participants were engaged in two different scenarios: a generic one, where the interaction was carried out with no intention to discuss a specific topic, and a nutrition one, aimed to accomplish a conversation on users' nutritional habits. Results of analyses performed on both audio and video data revealed that the EMPATHIC coach did not elicit negative feelings in the users. Indeed, users from all countries have shown relaxed and positive behavior when interacting with the simulated VC during both scenarios. Overall, the EMPATHIC-VC was capable to offer an enjoyable experience without eliciting negative feelings in the users. This supports the hypothesis that an Empathic Virtual Coach capable of considering users' expectations and emotional states could support elderly people in daily life activities and help them to remain independent.
Address VIRTUAL; October 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ GBB2021 Serial 3647
Permanent link to this record
 

 
Author Aitor Alvarez-Gila; Joost Van de Weijer; Estibaliz Garrote
Title Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB Type Conference Article
Year 2017 Publication 1st International Workshop on Physics Based Vision meets Deep Learning Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Hyperspectral signal reconstruction aims at recovering the original spectral input that produced a certain trichromatic (RGB) response from a capturing device or observer.
Given the heavily underconstrained, non-linear nature of the problem, traditional techniques leverage different statistical properties of the spectral signal in order to build informative priors from real world object reflectances for constructing such RGB to spectral signal mapping. However,
most of them treat each sample independently, and thus do not benefit from the contextual information that the spatial dimensions can provide. We pose hyperspectral natural image reconstruction as an image to image mapping learning problem, and apply a conditional generative adversarial framework to help capture spatial semantics. This is the first time Convolutional Neural Networks -and, particularly, Generative Adversarial Networks- are used to solve this task. Quantitative evaluation shows a Root Mean Squared Error (RMSE) drop of 44:7% and a Relative RMSE drop of 47:0% on the ICVL natural hyperspectral image dataset.
Address Venice; Italy; October 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV-PBDL
Notes LAMP; 600.109; 600.106; 600.120 Approved no
Call Number Admin @ si @ AWG2017 Serial 2969
Permanent link to this record
 

 
Author Jun Wan; Sergio Escalera; Gholamreza Anbarjafari; Hugo Jair Escalante; Xavier Baro; Isabelle Guyon; Meysam Madadi; Juri Allik; Jelena Gorbova; Chi Lin; Yiliang Xie
Title Results and Analysis of ChaLearn LAP Multi-modal Isolated and ContinuousGesture Recognition, and Real versus Fake Expressed Emotions Challenges Type Conference Article
Year 2017 Publication Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV Abbreviated Journal
Volume Issue Pages
Keywords
Abstract We analyze the results of the 2017 ChaLearn Looking at People Challenge at ICCV. The challenge comprised three tracks: (1) large-scale isolated (2) continuous gesture recognition, and (3) real versus fake expressed emotions tracks. It is the second round for both gesture recognition challenges, which were held first in the context of the ICPR 2016 workshop on “multimedia challenges beyond visual analysis”. In this second round, more participants joined the competitions, and the performances considerably improved compared to the first round. Particularly, the best recognition accuracy of isolated gesture recognition has improved from 56.90% to 67.71% in the IsoGD test set, and Mean Jaccard Index (MJI) of continuous gesture recognition has improved from 0.2869 to 0.6103 in the ConGD test set. The third track is the first challenge on real versus fake expressed emotion classification, including six emotion categories, for which a novel database was introduced. The first place was shared between two teams who achieved 67.70% averaged recognition rate on the test set. The data of the three tracks, the participants' code and method descriptions are publicly available to allow researchers to keep making progress in the field.
Address Venice; Italy; October 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ WEA2017 Serial 3066
Permanent link to this record
 

 
Author Albert Clapes; Tinne Tuytelaars; Sergio Escalera
Title Darwintrees for action recognition Type Conference Article
Year 2017 Publication Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ CTE2017 Serial 3069
Permanent link to this record
 

 
Author Ivet Rafegas; Maria Vanrell
Title Color representation in CNNs: parallelisms with biological vision Type Conference Article
Year 2017 Publication ICCV Workshop on Mutual Benefits ofr Cognitive and Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features
are efficiently represented. Here, we dissect a trained CNN
[2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions
of them to quantify color tuning properties of artificial neurons to provide a classification of the network population.
We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first
layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), objectshapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).
Address Venice; Italy; October 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV-MBCC
Notes CIC; 600.087; 600.051 Approved no
Call Number Admin @ si @ RaV2017 Serial 2984
Permanent link to this record
 

 
Author Leonardo Galteri; Dena Bazazian; Lorenzo Seidenari; Marco Bertini; Andrew Bagdanov; Anguelos Nicolaou; Dimosthenis Karatzas; Alberto del Bimbo
Title Reading Text in the Wild from Compressed Images Type Conference Article
Year 2017 Publication 1st International workshop on Egocentric Perception, Interaction and Computing Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts
that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant
impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates.
Address Venice; Italy; October 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV - EPIC
Notes DAG; 600.084; 600.121 Approved no
Call Number Admin @ si @ GBS2017 Serial 3006
Permanent link to this record
 

 
Author Antonio Hernandez; Carlos Primo; Sergio Escalera
Title Automatic user interaction correction via Multi-label Graph cuts Type Conference Article
Year 2011 Publication In ICCV 2011 1st IEEE International Workshop on Human Interaction in Computer Vision HICV Abbreviated Journal
Volume Issue Pages 1276-1281
Keywords
Abstract Most applications in image segmentation requires from user interaction in order to achieve accurate results. However, user wants to achieve the desired segmentation accuracy reducing effort of manual labelling. In this work, we extend standard multi-label α-expansion Graph Cut algorithm so that it analyzes the interaction of the user in order to modify the object model and improve final segmentation of objects. The approach is inspired in the fact that fast user interactions may introduce some pixel errors confusing object and background. Our results with different degrees of user interaction and input errors show high performance of the proposed approach on a multi-label human limb segmentation problem compared with classical α-expansion algorithm.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-4673-0062-9 Medium
Area Expedition Conference HICV
Notes MILAB; HuPBA Approved no
Call Number Admin @ si @ HPE2011 Serial 1892
Permanent link to this record
 

 
Author Jürgen Brauer; Wenjuan Gong; Jordi Gonzalez; Michael Arens
Title On the Effect of Temporal Information on Monocular 3D Human Pose Estimation Type Conference Article
Year 2011 Publication 2nd IEEE International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams Abbreviated Journal
Volume Issue Pages 906 - 913
Keywords
Abstract We address the task of estimating 3D human poses from monocular camera sequences. Many works make use of multiple consecutive frames for the estimation of a 3D pose in a frame. Although such an approach should ease the pose estimation task substantially since multiple consecutive frames allow to solve for 2D projection ambiguities in principle, it has not yet been investigated systematically how much we can improve the 3D pose estimates when using multiple consecutive frames opposed to single frame information. In this paper we analyze the difference in quality of 3D pose estimates based on different numbers of consecutive frames from which 2D pose estimates are available. We validate the use of temporal information on two major different approaches for human pose estimation – modeling and learning approaches. The results of our experiments show that both learning and modeling approaches benefit from using multiple frames opposed to single frame input but that the benefit is small when the 2D pose estimates show a high quality in terms of precision.
Address Barcelona
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-4673-0062-9 Medium
Area Expedition Conference ARTEMIS
Notes ISE Approved no
Call Number Admin @ si @BGG 2011 Serial 1860
Permanent link to this record
 

 
Author Yaxing Wang; Hector Laria Mantecon; Joost Van de Weijer; Laura Lopez-Fuentes; Bogdan Raducanu
Title TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets Type Conference Article
Year 2021 Publication 19th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue Pages 13990-13999
Keywords
Abstract Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application on small domains. In this paper, we propose a new transfer learning for I2I translation (TransferI2I). We decouple our learning process into the image generation step and the I2I translation step. In the first step we propose two novel techniques: source-target initialization and self-initialization of the adaptor layer. The former finetunes the pretrained generative model (e.g., StyleGAN) on source and target data. The latter allows to initialize all non-pretrained network parameters without the need of any data. These techniques provide a better initialization for the I2I translation step. In addition, we introduce an auxiliary GAN that further facilitates the training of deep I2I systems even from small datasets. In extensive experiments on three datasets, (Animal faces, Birds, and Foods), we show that we outperform existing methods and that mFID improves on several datasets with over 25 points.
Address Virtual; October 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV
Notes LAMP; 600.147; 602.200; 600.120 Approved no
Call Number Admin @ si @ WLW2021 Serial 3604
Permanent link to this record