|   | 
Details
   web
Records
Author Alicia Fornes; Volkmar Frinken; Andreas Fischer; Jon Almazan; G. Jackson; Horst Bunke
Title A Keyword Spotting Approach Using Blurred Shape Model-Based Descriptors Type Conference Article
Year 2011 Publication (up) Proceedings of the 2011 Workshop on Historical Document Imaging and Processing Abbreviated Journal
Volume Issue Pages 83-90
Keywords
Abstract The automatic processing of handwritten historical documents is considered a hard problem in pattern recognition. In addition to the challenges given by modern handwritten data, a lack of training data as well as effects caused by the degradation of documents can be observed. In this scenario, keyword spotting arises to be a viable solution to make documents amenable for searching and browsing. For this task we propose the adaptation of shape descriptors used in symbol recognition. By treating each word image as a shape, it can be represented using the Blurred Shape Model and the De-formable Blurred Shape Model. Experiments on the George Washington database demonstrate that this approach is able to outperform the commonly used Dynamic Time Warping approach.
Address
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-4503-0916-5 Medium
Area Expedition Conference HIP
Notes DAG Approved no
Call Number Admin @ si @ FFF2011a Serial 1823
Permanent link to this record
 

 
Author Andreas Fischer; Volkmar Frinken; Alicia Fornes; Horst Bunke
Title Transcription Alignment of Latin Manuscripts Using Hidden Markov Models Type Conference Article
Year 2011 Publication (up) Proceedings of the 2011 Workshop on Historical Document Imaging and Processing Abbreviated Journal
Volume Issue Pages 29-36
Keywords
Abstract Transcriptions of historical documents are a valuable source for extracting labeled handwriting images that can be used for training recognition systems. In this paper, we introduce the Saint Gall database that includes images as well as the transcription of a Latin manuscript from the 9th century written in Carolingian script. Although the available transcription is of high quality for a human reader, the spelling of the words is not accurate when compared with the handwriting image. Hence, the transcription poses several challenges for alignment regarding, e.g., line breaks, abbreviations, and capitalization. We propose an alignment system based on character Hidden Markov Models that can cope with these challenges and efficiently aligns complete document pages. On the Saint Gall database, we demonstrate that a considerable alignment accuracy can be achieved, even with weakly trained character models.
Address
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference HIP
Notes DAG Approved no
Call Number Admin @ si @ FFF2011b Serial 1824
Permanent link to this record
 

 
Author Cristina Palmero; Oleg V Komogortsev; Sergio Escalera; Sachin S Talathi
Title Multi-Rate Sensor Fusion for Unconstrained Near-Eye Gaze Estimation Type Conference Article
Year 2023 Publication (up) Proceedings of the 2023 Symposium on Eye Tracking Research and Applications Abbreviated Journal
Volume Issue Pages 1-8
Keywords
Abstract The power requirements of video-oculography systems can be prohibitive for high-speed operation on portable devices. Recently, low-power alternatives such as photosensors have been evaluated, providing gaze estimates at high frequency with a trade-off in accuracy and robustness. Potentially, an approach combining slow/high-fidelity and fast/low-fidelity sensors should be able to exploit their complementarity to track fast eye motion accurately and robustly. To foster research on this topic, we introduce OpenSFEDS, a near-eye gaze estimation dataset containing approximately 2M synthetic camera-photosensor image pairs sampled at 500 Hz under varied appearance and camera position. We also formulate the task of sensor fusion for gaze estimation, proposing a deep learning framework consisting in appearance-based encoding and temporal eye-state dynamics. We evaluate several single- and multi-rate fusion baselines on OpenSFEDS, achieving 8.7% error decrease when tracking fast eye movements with a multi-rate approach vs. a gaze forecasting approach operating with a low-speed sensor alone.
Address Tubingen; Germany; May 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ETRA
Notes HUPBA Approved no
Call Number Admin @ si @ PKE2023 Serial 3923
Permanent link to this record
 

 
Author Debora Gil; Jordi Gonzalez; Gemma Sanchez (eds)
Title Computer Vision: Advances in Research and Development Type Book Whole
Year 2007 Publication (up) Proceedings of the 2nd CVC International Workshop Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher UAB Place of Publication Bellaterra (Spain) Editor Debora Gil; Jordi Gonzalez; Gemma Sanchez
Language Summary Language Original Title
Series Editor Series Title 2 Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-935251-4-9 Medium
Area Expedition Conference
Notes IAM; ISE; DAG Approved no
Call Number IAM @ iam @ GGS2007 Serial 1493
Permanent link to this record
 

 
Author J. Filipe; Juan Andrade; J.L. Ferrier
Title FAF 2005 Type Miscellaneous
Year 2005 Publication (up) Proceedings of the 2nd International Conference on Informatics in Control, Automation and Robotics, INSTICC Press Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Barcelona (Spain)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ FAF2005 Serial 609
Permanent link to this record
 

 
Author A. Pujol; Jordi Vitria; Petia Radeva; Xavier Binefa; Robert Benavente; Ernest Valveny; Craig Von Land
Title Real time pharmaceutical product recognition using color and shape indexing. Type Conference Article
Year 1999 Publication (up) Proceedings of the 2nd International Workshop on European Scientific and Industrial Collaboration (WESIC´99), Promotoring Advanced Technologies in Manufacturing. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Wales
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;MILAB;DAG;CIC;MV Approved no
Call Number BCNPCL @ bcnpcl @ PVR1999 Serial 24
Permanent link to this record
 

 
Author Siyang Song; Micol Spitale; Cheng Luo; German Barquero; Cristina Palmero; Sergio Escalera; Michel Valstar; Tobias Baur; Fabien Ringeval; Elisabeth Andre; Hatice Gunes
Title REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge Type Conference Article
Year 2023 Publication (up) Proceedings of the 31st ACM International Conference on Multimedia Abbreviated Journal
Volume Issue Pages 9620–9624
Keywords
Abstract The Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual behaviour analysis and behaviour generation (a.k.a generative AI) communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) the novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2023.
Address Otawa; Canada; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MM
Notes HUPBA Approved no
Call Number Admin @ si @ SSL2023 Serial 3931
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas
Title Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement Type Conference Article
Year 2023 Publication (up) Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal
Volume 37 Issue 2 Pages
Keywords Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning
Abstract In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AAAI
Notes DAG Approved no
Call Number Admin @ si @ SBM2023 Serial 3848
Permanent link to this record
 

 
Author Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
Title Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia Type Conference Article
Year 2023 Publication (up) Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal
Volume 37 Issue 2 Pages 1940-1948
Keywords
Abstract Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.
Address Washington; USA; February 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AAAI
Notes DAG Approved no
Call Number Admin @ si @ NBM2023 Serial 3860
Permanent link to this record
 

 
Author Maya Dimitrova; N. Kushmerick; Petia Radeva; Juan J. Villanueva
Title User Assesment of a Visual Genre Classifier Type Miscellaneous
Year 2003 Publication (up) Proceedings of the 3rd IASTED Int. Conference Visualization, Imaging and Image Processing Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number BCNPCL @ bcnpcl @ DKR2003 Serial 372
Permanent link to this record
 

 
Author Francisco Javier Orozco; F.A. Garcia; J.L. Arcos; Jordi Gonzalez
Title Spatio-Temporal Reasoning for Reliable Facial Expression Interpretation Type Conference Article
Year 2007 Publication (up) Proceedings of the 5th International Conference on Computer Vision Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Bielefeld University (Germany)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICVS
Notes ISE Approved no
Call Number ISE @ ise @ OGA2007 Serial 772
Permanent link to this record
 

 
Author David Geronimo; Angel Sappa; Antonio Lopez; Daniel Ponsa
Title Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection Type Conference Article
Year 2007 Publication (up) Proceedings of the 5th International Conference on Computer Vision Systems Abbreviated Journal ICVS
Volume Issue Pages
Keywords Pedestrian Detection
Abstract On–board pedestrian detection is in the frontier of the state–of–the–art since it implies processing outdoor scenarios from a mobile platform and searching for aspect–changing objects in cluttered urban environments. Most promising approaches include the development of classifiers based on feature selection and machine learning. However, they use a large number of features which compromises real–time. Thus, methods for running the classifiers in only a few image windows must be provided. In this paper we contribute in both aspects, proposing a camera
pose estimation method for adaptive sparse image sampling, as well as a classifier for pedestrian detection based on Haar wavelets and edge orientation histograms as features and AdaBoost as learning machine. Both proposals are compared with relevant approaches in the literature, showing comparable results but reducing processing time by four for the sampling tasks and by ten for the classification one.
Address Bielefeld (Germany)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number ADAS @ adas @ gsl2007a Serial 786
Permanent link to this record
 

 
Author Christian Keilstrup Ingwersen; Artur Xarles; Albert Clapes; Meysam Madadi; Janus Nortoft Jensen; Morten Rieger Hannemose; Anders Bjorholm Dahl; Sergio Escalera
Title Video-based Skill Assessment for Golf: Estimating Golf Handicap Type Conference Article
Year 2023 Publication (up) Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports Abbreviated Journal
Volume Issue Pages 31-39
Keywords
Abstract Automated skill assessment in sports using video-based analysis holds great potential for revolutionizing coaching methodologies. This paper focuses on the problem of skill determination in golfers by leveraging deep learning models applied to a large database of video recordings of golf swings. We investigate different regression, ranking and classification based methods and compare to a simple baseline approach. The performance is evaluated using mean squared error (MSE) as well as computing the percentages of correctly ranked pairs based on the Kendall correlation. Our results demonstrate an improvement over the baseline, with a 35% lower mean squared error and 68% correctly ranked pairs. However, achieving fine-grained skill assessment remains challenging. This work contributes to the development of AI-driven coaching systems and advances the understanding of video-based skill determination in the context of golf.
Address Otawa; Canada; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MMSports
Notes HUPBA Approved no
Call Number Admin @ si @ KXC2023 Serial 3929
Permanent link to this record
 

 
Author Artur Xarles; Sergio Escalera; Thomas B. Moeslund; Albert Clapes
Title ASTRA: An Action Spotting TRAnsformer for Soccer Videos Type Conference Article
Year 2023 Publication (up) Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports Abbreviated Journal
Volume Issue Pages 93–102
Keywords
Abstract In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
Address Otawa; Canada; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MMSports
Notes HUPBA Approved no
Call Number Admin @ si @ XEM2023 Serial 3970
Permanent link to this record
 

 
Author Partha Pratim Roy; Umapada Pal; Josep Llados
Title Multi-oriented English Text Line Extraction using Background and Foreground Information Type Conference Article
Year 2008 Publication (up) Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, Abbreviated Journal
Volume Issue Pages 315–322
Keywords
Abstract
Address Nara (Japo)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG Approved no
Call Number DAG @ dag @ RPL2008b Serial 1047
Permanent link to this record