|
Records |
Links |
|
Author |
Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva |
|
|
Title |
Multi-class Binary Object Categorization using Blurred Shape Models |
Type |
Conference Article |
|
Year |
2007 |
Publication |
Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamerican Congress on Pattern |
Abbreviated Journal |
|
|
|
Volume |
4756 |
Issue |
|
Pages |
773–782 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LCNS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-3-540-76724-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIARP |
|
|
Notes |
MILAB; DAG;HuPBA |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ EFP2007 |
Serial |
911 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva |
|
|
Title |
Circular Blurred Shape Model for Multiclass Symbol Recognition |
Type |
Journal Article |
|
Year |
2011 |
Publication |
IEEE Transactions on Systems, Man and Cybernetics (Part B) (IEEE) |
Abbreviated Journal |
TSMCB |
|
|
Volume |
41 |
Issue |
2 |
Pages |
497-506 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1083-4419 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; DAG;HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ EFP2011 |
Serial |
1784 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Alicia Fornes; Oriol Pujol; Alberto Escudero; Petia Radeva |
|
|
Title |
Circular Blurred Shape Model for Symbol Spotting in Documents |
Type |
Conference Article |
|
Year |
2009 |
Publication |
16th IEEE International Conference on Image Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1985-1988 |
|
|
Keywords |
|
|
|
Abstract |
Symbol spotting problem requires feature extraction strategies able to generalize from training samples and to localize the target object while discarding most part of the image. In the case of document analysis, symbol spotting techniques have to deal with a high variability of symbols' appearance. In this paper, we propose the Circular Blurred Shape Model descriptor. Feature extraction is performed capturing the spatial arrangement of significant object characteristics in a correlogram structure. Shape information from objects is shared among correlogram regions, being tolerant to the irregular deformations. Descriptors are learnt using a cascade of classifiers and Abadoost as the base classifier. Finally, symbol spotting is performed by means of a windowing strategy using the learnt cascade over plan and old musical score documents. Spotting and multi-class categorization results show better performance comparing with the state-of-the-art descriptors. |
|
|
Address |
Cairo, Egypt |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4244-5653-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIP |
|
|
Notes |
MILAB;HuPBA;DAG |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ EFP2009b |
Serial |
1184 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Alicia Fornes; O. Pujol; Petia Radeva; Gemma Sanchez; Josep Llados |
|
|
Title |
Blurred Shape Model for Binary and Grey-level Symbol Recognition |
Type |
Journal Article |
|
Year |
2009 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
30 |
Issue |
15 |
Pages |
1424–1433 |
|
|
Keywords |
|
|
|
Abstract |
Many symbol recognition problems require the use of robust descriptors in order to obtain rich information of the data. However, the research of a good descriptor is still an open issue due to the high variability of symbols appearance. Rotation, partial occlusions, elastic deformations, intra-class and inter-class variations, or high variability among symbols due to different writing styles, are just a few problems. In this paper, we introduce a symbol shape description to deal with the changes in appearance that these types of symbols suffer. The shape of the symbol is aligned based on principal components to make the recognition invariant to rotation and reflection. Then, we present the Blurred Shape Model descriptor (BSM), where new features encode the probability of appearance of each pixel that outlines the symbols shape. Moreover, we include the new descriptor in a system to deal with multi-class symbol categorization problems. Adaboost is used to train the binary classifiers, learning the BSM features that better split symbol classes. Then, the binary problems are embedded in an Error-Correcting Output Codes framework (ECOC) to deal with the multi-class case. The methodology is evaluated on different synthetic and real data sets. State-of-the-art descriptors and classifiers are compared, showing the robustness and better performance of the present scheme to classify symbols with high variability of appearance. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA; DAG; MILAB |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ EFP2009a |
Serial |
1180 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; David M.J. Tax; Oriol Pujol; Petia Radeva; Robert P.W. Duin |
|
|
Title |
Multi-Class Classification in Image Analysis Via Error-Correcting Output Codes |
Type |
Book Chapter |
|
Year |
2011 |
Publication |
Innovations in Intelligent Image Analysis |
Abbreviated Journal |
|
|
|
Volume |
339 |
Issue |
|
Pages |
7-29 |
|
|
Keywords |
|
|
|
Abstract |
A common way to model multi-class classification problems is by means of Error-Correcting Output Codes (ECOC). Given a multi-class problem, the ECOC technique designs a codeword for each class, where each position of the code identifies the membership of the class for a given binary problem.A classification decision is obtained by assigning the label of the class with the closest code. In this paper, we overview the state-of-the-art on ECOC designs and test them in real applications. Results on different multi-class data sets show the benefits of using the ensemble of classifiers when categorizing objects in images. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
Berlin |
Editor |
H. Kawasnicka; L.Jain |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1860-949X |
ISBN |
978-3-642-17933-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB;HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ ETP2011 |
Serial |
1746 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera |
|
|
Title |
Fast traffic model matching and recognition on gray-scale images |
Type |
Report |
|
Year |
2005 |
Publication |
CVC Technical Report #84 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
CVC (UAB) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; HuPBA |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ Esc2005 |
Serial |
572 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera |
|
|
Title |
Coding and Decoding Design of ECOCs for Multi-Class Pattern and Object Recognition |
Type |
Miscellaneous |
|
Year |
2008 |
Publication |
|
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; HuPBA |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ Esc2008a |
Serial |
1106 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera |
|
|
Title |
Multi-Modal Human Behaviour Analysis from Visual Data Sources |
Type |
Journal |
|
Year |
2013 |
Publication |
ERCIM News journal |
Abbreviated Journal |
ERCIM |
|
|
Volume |
95 |
Issue |
|
Pages |
21-22 |
|
|
Keywords |
|
|
|
Abstract |
The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0926-4981 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ Esc2013 |
Serial |
2361 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera |
|
|
Title |
Human Behavior Analysis From Depth Maps |
Type |
Conference Article |
|
Year |
2012 |
Publication |
7th Conference on Articulated Motion and Deformable Objects |
Abbreviated Journal |
|
|
|
Volume |
7378 |
Issue |
|
Pages |
282-292 |
|
|
Keywords |
|
|
|
Abstract |
Pose Recovery (PR) and Human Behavior Analysis (HBA) have been a main focus of interest from the beginnings of Computer Vision and Machine Learning. PR and HBA were originally addressed by the analysis of still images and image sequences. More recent strategies consisted of Motion Capture technology (MOCAP), based on the synchronization of multiple cameras in controlled environments; and the analysis of depth maps from Time-of-Flight (ToF) technology, based on range image recording from distance sensor measurements. Recently, with the appearance of the multi-modal RGBD information provided by the low cost Kinect \textsfTM sensor (from RGB and Depth, respectively), classical methods for PR and HBA have been redefined, and new strategies have been proposed. In this paper, the recent contributions and future trends of multi-modal RGBD data analysis for PR and HBA are reviewed and discussed. |
|
|
Address |
Mallorca |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Heidelberg |
Place of Publication |
|
Editor |
F.J. Perales; R.B. Fisher; T.B. Moeslund |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-31566-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AMDO |
|
|
Notes |
MILAB; HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ Esc2012 |
Serial |
2040 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera |
|
|
Title |
Coding and Decoding Design of ECOCs for Multi-class Pattern and Object Recognition A |
Type |
Book Whole |
|
Year |
2008 |
Publication |
PhD Thesis, Universitat de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Many real problems require multi-class decisions. In the Pattern Recognition field,
many techniques have been proposed to deal with the binary problem. However,
the extension of many 2-class classifiers to the multi-class case is a hard task. In
this sense, Error-Correcting Output Codes (ECOC) demonstrated to be a powerful
tool to combine any number of binary classifiers to model multi-class problems. But
there are still many open issues about the capabilities of the ECOC framework. In
this thesis, the two main stages of an ECOC design are analyzed: the coding and
the decoding steps. We present different problem-dependent designs. These designs
take advantage of the knowledge of the problem domain to minimize the number
of classifiers, obtaining a high classification performance. On the other hand, we
analyze the ECOC codification in order to define new decoding rules that take full
benefit from the information provided at the coding step. Moreover, as a successful
classification requires a rich feature set, new feature detection/extraction techniques
are presented and evaluated on the new ECOC designs. The evaluation of the new
methodology is performed on different real and synthetic data sets: UCI Machine
Learning Repository, handwriting symbols, traffic signs from a Mobile Mapping System, Intravascular Ultrasound images, Caltech Repository data set or Chaga’s disease
data set. The results of this thesis show that significant performance improvements
are obtained on both traditional coding and decoding ECOC designs when the new
coding and decoding rules are taken into account. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Petia Radeva;Oriol Pujol |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ Esc2008b |
Serial |
2217 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Alloza; Flavio Escribano; Sergi Delgado; Ciprian Corneanu; Sergio Escalera |
|
|
Title |
XBadges. Identifying and training soft skills with commercial video games Improving persistence, risk taking & spatial reasoning with commercial video games and facial and emotional recognition system |
Type |
Conference Article |
|
Year |
2017 |
Publication |
4th Congreso de la Sociedad Española para las Ciencias del Videojuego |
Abbreviated Journal |
|
|
|
Volume |
1957 |
Issue |
|
Pages |
13-28 |
|
|
Keywords |
Video Games; Soft Skills; Training; Skilling Development; Emotions; Cognitive Abilities; Flappy Bird; Pacman; Tetris |
|
|
Abstract |
XBadges is a research project based on the hypothesis that commercial video games (nonserious games) can train soft skills. We measure persistence, patial reasoning and risk taking before and after subjects paticipate in controlled game playing sessions.
In addition, we have developed an automatic facial expression recognition system capable of inferring their emotions while playing, allowing us to study the role of emotions in soft skills acquisition. We have used Flappy Bird, Pacman and Tetris for assessing changes in persistence, risk taking and spatial reasoning respectively.
Results show how playing Tetris significantly improves spatial reasoning and how playing Pacman significantly improves prudence in certain areas of behavior. As for emotions, they reveal that being concentrated helps to improve performance and skills acquisition. Frustration is also shown as a key element. With the results obtained we are able to glimpse multiple applications in areas which need soft skills development. |
|
|
Address |
Barcelona; June 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
COSECIVI; CEUR-WS |
|
|
Notes |
HUPBA; no menciona;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ AED2017 |
Serial |
3065 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergi Garcia Bordils; George Tom; Sangeeth Reddy; Minesh Mathew; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas |
|
|
Title |
Read While You Drive-Multilingual Text Tracking on the Road |
Type |
Conference Article |
|
Year |
2022 |
Publication |
15th IAPR International workshop on document analysis systems |
Abbreviated Journal |
|
|
|
Volume |
13237 |
Issue |
|
Pages |
756–770 |
|
|
Keywords |
|
|
|
Abstract |
Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild. |
|
|
Address |
La Rochelle; France; May 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-3-031-06554-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG; 600.155; 611.022; 611.004 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GTR2022 |
Serial |
3783 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol |
|
|
Title |
Accelerating Transformer-Based Scene Text Detection and Recognition via Token Pruning |
Type |
Conference Article |
|
Year |
2023 |
Publication |
17th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
14192 |
Issue |
|
Pages |
106-121 |
|
|
Keywords |
Scene Text Detection; Scene Text Recognition; Transformer Acceleration |
|
|
Abstract |
Scene text detection and recognition is a crucial task in computer vision with numerous real-world applications. Transformer-based approaches are behind all current state-of-the-art models and have achieved excellent performance. However, the computational requirements of the transformer architecture makes training these methods slow and resource heavy. In this paper, we introduce a new token pruning strategy that significantly decreases training and inference times without sacrificing performance, striking a balance between accuracy and speed. We have applied this pruning technique to our own end-to-end transformer-based scene text understanding architecture. Our method uses a separate detection branch to guide the pruning of uninformative image features, which significantly reduces the number of tokens at the input of the transformer. Experimental results show how our network is able to obtain competitive results on multiple public benchmarks while running at significantly higher speeds. |
|
|
Address |
San Jose; CA; USA; August 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ GKR2023a |
Serial |
3907 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol |
|
|
Title |
STEP – Towards Structured Scene-Text Spotting |
Type |
Conference Article |
|
Year |
2024 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
883-892 |
|
|
Keywords |
|
|
|
Abstract |
We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios. |
|
|
Address |
Waikoloa; Hawai; USA; January 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ GKR2024 |
Serial |
3992 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergi Garcia Bordils; Andres Mafla; Ali Furkan Biten; Oren Nuriel; Aviad Aberdam; Shai Mazor; Ron Litman; Dimosthenis Karatzas |
|
|
Title |
Out-of-Vocabulary Challenge Report |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Proceedings European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume |
13804 |
Issue |
|
Pages |
359–375 |
|
|
Keywords |
|
|
|
Abstract |
This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character Recognition (OCR) models, namely, the recognition of unseen scene text instances at training time. The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances, thus covering a wide range of data distributions. A new and independent validation and test set is formed with scene text instances that are out of vocabulary at training time. The competition was structured in two tasks, end-to-end and cropped scene text recognition respectively. A thorough analysis of results from baselines and different participants is presented. Interestingly, current state-of-the-art models show a significant performance gap under the newly studied setting. We conclude that the OOV dataset proposed in this challenge will be an essential area to be explored in order to develop scene text models that achieve more robust and generalized predictions. |
|
|
Address |
Tel-Aviv; Israel; October 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCVW |
|
|
Notes |
DAG; 600.155; 302.105; 611.002 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GMB2022 |
Serial |
3771 |
|
Permanent link to this record |