|   | 
Details
   web
Records
Author (down) Sergio Escalera; David M.J. Tax; Oriol Pujol; Petia Radeva; Robert P.W. Duin
Title Multi-Class Classification in Image Analysis Via Error-Correcting Output Codes Type Book Chapter
Year 2011 Publication Innovations in Intelligent Image Analysis Abbreviated Journal
Volume 339 Issue Pages 7-29
Keywords
Abstract A common way to model multi-class classification problems is by means of Error-Correcting Output Codes (ECOC). Given a multi-class problem, the ECOC technique designs a codeword for each class, where each position of the code identifies the membership of the class for a given binary problem.A classification decision is obtained by assigning the label of the class with the closest code. In this paper, we overview the state-of-the-art on ECOC designs and test them in real applications. Results on different multi-class data sets show the benefits of using the ensemble of classifiers when categorizing objects in images.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Berlin Editor H. Kawasnicka; L.Jain
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1860-949X ISBN 978-3-642-17933-4 Medium
Area Expedition Conference
Notes MILAB;HuPBA Approved no
Call Number Admin @ si @ ETP2011 Serial 1746
Permanent link to this record
 

 
Author (down) Sergio Escalera
Title Fast traffic model matching and recognition on gray-scale images Type Report
Year 2005 Publication CVC Technical Report #84 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; HuPBA Approved no
Call Number BCNPCL @ bcnpcl @ Esc2005 Serial 572
Permanent link to this record
 

 
Author (down) Sergio Escalera
Title Coding and Decoding Design of ECOCs for Multi-Class Pattern and Object Recognition Type Miscellaneous
Year 2008 Publication Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; HuPBA Approved no
Call Number BCNPCL @ bcnpcl @ Esc2008a Serial 1106
Permanent link to this record
 

 
Author (down) Sergio Escalera
Title Multi-Modal Human Behaviour Analysis from Visual Data Sources Type Journal
Year 2013 Publication ERCIM News journal Abbreviated Journal ERCIM
Volume 95 Issue Pages 21-22
Keywords
Abstract The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0926-4981 ISBN Medium
Area Expedition Conference
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ Esc2013 Serial 2361
Permanent link to this record
 

 
Author (down) Sergio Escalera
Title Human Behavior Analysis From Depth Maps Type Conference Article
Year 2012 Publication 7th Conference on Articulated Motion and Deformable Objects Abbreviated Journal
Volume 7378 Issue Pages 282-292
Keywords
Abstract Pose Recovery (PR) and Human Behavior Analysis (HBA) have been a main focus of interest from the beginnings of Computer Vision and Machine Learning. PR and HBA were originally addressed by the analysis of still images and image sequences. More recent strategies consisted of Motion Capture technology (MOCAP), based on the synchronization of multiple cameras in controlled environments; and the analysis of depth maps from Time-of-Flight (ToF) technology, based on range image recording from distance sensor measurements. Recently, with the appearance of the multi-modal RGBD information provided by the low cost Kinect \textsfTM sensor (from RGB and Depth, respectively), classical methods for PR and HBA have been redefined, and new strategies have been proposed. In this paper, the recent contributions and future trends of multi-modal RGBD data analysis for PR and HBA are reviewed and discussed.
Address Mallorca
Corporate Author Thesis
Publisher Springer Heidelberg Place of Publication Editor F.J. Perales; R.B. Fisher; T.B. Moeslund
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-31566-4 Medium
Area Expedition Conference AMDO
Notes MILAB; HuPBA Approved no
Call Number Admin @ si @ Esc2012 Serial 2040
Permanent link to this record
 

 
Author (down) Sergio Escalera
Title Coding and Decoding Design of ECOCs for Multi-class Pattern and Object Recognition A Type Book Whole
Year 2008 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Many real problems require multi-class decisions. In the Pattern Recognition field,
many techniques have been proposed to deal with the binary problem. However,
the extension of many 2-class classifiers to the multi-class case is a hard task. In
this sense, Error-Correcting Output Codes (ECOC) demonstrated to be a powerful
tool to combine any number of binary classifiers to model multi-class problems. But
there are still many open issues about the capabilities of the ECOC framework. In
this thesis, the two main stages of an ECOC design are analyzed: the coding and
the decoding steps. We present different problem-dependent designs. These designs
take advantage of the knowledge of the problem domain to minimize the number
of classifiers, obtaining a high classification performance. On the other hand, we
analyze the ECOC codification in order to define new decoding rules that take full
benefit from the information provided at the coding step. Moreover, as a successful
classification requires a rich feature set, new feature detection/extraction techniques
are presented and evaluated on the new ECOC designs. The evaluation of the new
methodology is performed on different real and synthetic data sets: UCI Machine
Learning Repository, handwriting symbols, traffic signs from a Mobile Mapping System, Intravascular Ultrasound images, Caltech Repository data set or Chaga’s disease
data set. The results of this thesis show that significant performance improvements
are obtained on both traditional coding and decoding ECOC designs when the new
coding and decoding rules are taken into account.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Petia Radeva;Oriol Pujol
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; HuPBA Approved no
Call Number Admin @ si @ Esc2008b Serial 2217
Permanent link to this record
 

 
Author (down) Sergio Alloza; Flavio Escribano; Sergi Delgado; Ciprian Corneanu; Sergio Escalera
Title XBadges. Identifying and training soft skills with commercial video games Improving persistence, risk taking & spatial reasoning with commercial video games and facial and emotional recognition system Type Conference Article
Year 2017 Publication 4th Congreso de la Sociedad Española para las Ciencias del Videojuego Abbreviated Journal
Volume 1957 Issue Pages 13-28
Keywords Video Games; Soft Skills; Training; Skilling Development; Emotions; Cognitive Abilities; Flappy Bird; Pacman; Tetris
Abstract XBadges is a research project based on the hypothesis that commercial video games (nonserious games) can train soft skills. We measure persistence, patial reasoning and risk taking before and after subjects paticipate in controlled game playing sessions.
In addition, we have developed an automatic facial expression recognition system capable of inferring their emotions while playing, allowing us to study the role of emotions in soft skills acquisition. We have used Flappy Bird, Pacman and Tetris for assessing changes in persistence, risk taking and spatial reasoning respectively.
Results show how playing Tetris significantly improves spatial reasoning and how playing Pacman significantly improves prudence in certain areas of behavior. As for emotions, they reveal that being concentrated helps to improve performance and skills acquisition. Frustration is also shown as a key element. With the results obtained we are able to glimpse multiple applications in areas which need soft skills development.
Address Barcelona; June 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference COSECIVI; CEUR-WS
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ AED2017 Serial 3065
Permanent link to this record
 

 
Author (down) Sergi Garcia Bordils; George Tom; Sangeeth Reddy; Minesh Mathew; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas
Title Read While You Drive-Multilingual Text Tracking on the Road Type Conference Article
Year 2022 Publication 15th IAPR International workshop on document analysis systems Abbreviated Journal
Volume 13237 Issue Pages 756–770
Keywords
Abstract Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.
Address La Rochelle; France; May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-031-06554-5 Medium
Area Expedition Conference DAS
Notes DAG; 600.155; 611.022; 611.004 Approved no
Call Number Admin @ si @ GTR2022 Serial 3783
Permanent link to this record
 

 
Author (down) Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol
Title Accelerating Transformer-Based Scene Text Detection and Recognition via Token Pruning Type Conference Article
Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 14192 Issue Pages 106-121
Keywords Scene Text Detection; Scene Text Recognition; Transformer Acceleration
Abstract Scene text detection and recognition is a crucial task in computer vision with numerous real-world applications. Transformer-based approaches are behind all current state-of-the-art models and have achieved excellent performance. However, the computational requirements of the transformer architecture makes training these methods slow and resource heavy. In this paper, we introduce a new token pruning strategy that significantly decreases training and inference times without sacrificing performance, striking a balance between accuracy and speed. We have applied this pruning technique to our own end-to-end transformer-based scene text understanding architecture. Our method uses a separate detection branch to guide the pruning of uninformative image features, which significantly reduces the number of tokens at the input of the transformer. Experimental results show how our network is able to obtain competitive results on multiple public benchmarks while running at significantly higher speeds.
Address San Jose; CA; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ GKR2023a Serial 3907
Permanent link to this record
 

 
Author (down) Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol
Title STEP – Towards Structured Scene-Text Spotting Type Conference Article
Year 2024 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 883-892
Keywords
Abstract We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios.
Address Waikoloa; Hawai; USA; January 2024
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG Approved no
Call Number Admin @ si @ GKR2024 Serial 3992
Permanent link to this record
 

 
Author (down) Sergi Garcia Bordils; Andres Mafla; Ali Furkan Biten; Oren Nuriel; Aviad Aberdam; Shai Mazor; Ron Litman; Dimosthenis Karatzas
Title Out-of-Vocabulary Challenge Report Type Conference Article
Year 2022 Publication Proceedings European Conference on Computer Vision Workshops Abbreviated Journal
Volume 13804 Issue Pages 359–375
Keywords
Abstract This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character Recognition (OCR) models, namely, the recognition of unseen scene text instances at training time. The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances, thus covering a wide range of data distributions. A new and independent validation and test set is formed with scene text instances that are out of vocabulary at training time. The competition was structured in two tasks, end-to-end and cropped scene text recognition respectively. A thorough analysis of results from baselines and different participants is presented. Interestingly, current state-of-the-art models show a significant performance gap under the newly studied setting. We conclude that the OOV dataset proposed in this challenge will be an essential area to be explored in order to develop scene text models that achieve more robust and generalized predictions.
Address Tel-Aviv; Israel; October 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCVW
Notes DAG; 600.155; 302.105; 611.002 Approved no
Call Number Admin @ si @ GMB2022 Serial 3771
Permanent link to this record
 

 
Author (down) Senmao Li; Joost Van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang
Title 3D-Aware Multi-Class Image-to-Image Translation with NeRFs Type Conference Article
Year 2023 Publication 36th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 12652-12662
Keywords
Abstract Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware 121 translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In exten-sive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware 121 translation with multi-view consistency. Code is available in 3DI2I.
Address Vancouver; Canada; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes LAMP Approved no
Call Number Admin @ si @ LWW2023b Serial 3920
Permanent link to this record
 

 
Author (down) Senmao Li; Joost van de Weijer; Taihang Hu; Fahad Shahbaz Khan; Qibin Hou; Yaxing Wang; Jian Yang
Title StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing Type Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ LWH2023 Serial 3870
Permanent link to this record
 

 
Author (down) Sebastien Mace; Herve Locteau; Ernest Valveny; Salvatore Tabbone
Title A system to detect rooms in architectural floor plan images Type Conference Article
Year 2010 Publication 9th IAPR International Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 167–174
Keywords
Abstract In this article, a system to detect rooms in architectural floor plan images is described. We first present a primitive extraction algorithm for line detection. It is based on an original coupling of classical Hough transform with image vectorization in order to perform robust and efficient line detection. We show how the lines that satisfy some graphical arrangements are combined into walls. We also present the way we detect some door hypothesis thanks to the extraction of arcs. Walls and door hypothesis are then used by our room segmentation strategy; it consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines between regions can also be rough. We take advantage of knowledge associated to architectural floor plans in order to obtain mostly rectangular rooms. Qualitative and quantitative evaluations performed on a corpus of real documents show promising results.
Address Boston; USA
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-60558-773-8 Medium
Area Expedition Conference DAS
Notes DAG Approved no
Call Number DAG @ dag @ MLV2010 Serial 1437
Permanent link to this record
 

 
Author (down) Sebastian Ramos
Title Vision-based Detection of Road Hazards for Autonomous Driving Type Report
Year 2014 Publication CVC Technical Report Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address UAB; September 2014
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ Ram2014 Serial 2580
Permanent link to this record