|
Records |
Links |
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas |

|
|
Title |
Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue  |
2 |
Pages |
|
|
|
Keywords |
Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning |
|
|
Abstract |
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBM2023 |
Serial |
3848 |
|
Permanent link to this record |
|
|
|
|
Author |
Mickael Coustaty; Alicia Fornes |

|
|
Title |
Document Analysis and Recognition – ICDAR 2023 Workshops |
Type |
Book Whole |
|
Year |
2023 |
Publication |
Document Analysis and Recognition – ICDAR 2023 Workshops |
Abbreviated Journal |
|
|
|
Volume |
14194 |
Issue  |
2 |
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
San Jose; USA; August 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ CoF2023 |
Serial |
3852 |
|
Permanent link to this record |
|
|
|
|
Author |
Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |

|
|
Title |
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue  |
2 |
Pages |
1940-1948 |
|
|
Keywords |
|
|
|
Abstract |
Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. |
|
|
Address |
Washington; USA; February 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ NBM2023 |
Serial |
3860 |
|
Permanent link to this record |
|
|
|
|
Author |
T.Chauhan; E.Perales; Kaida Xiao; E.Hird ; Dimosthenis Karatzas; Sophie Wuerger |

|
|
Title |
The achromatic locus: Effect of navigation direction in color space |
Type |
Journal Article |
|
Year |
2014 |
Publication |
Journal of Vision |
Abbreviated Journal |
VSS |
|
|
Volume |
14 (1) |
Issue  |
25 |
Pages |
1-11 |
|
|
Keywords |
achromatic; unique hues; color constancy; luminance; color space |
|
|
Abstract |
5Y Impact Factor: 2.99 / 1st (Ophthalmology)
An achromatic stimulus is defined as a patch of light that is devoid of any hue. This is usually achieved by asking observers to adjust the stimulus such that it looks neither red nor green and at the same time neither yellow nor blue. Despite the theoretical and practical importance of the achromatic locus, little is known about the variability in these settings. The main purpose of the current study was to evaluate whether achromatic settings were dependent on the task of the observers, namely the navigation direction in color space. Observers could either adjust the test patch along the two chromatic axes in the CIE u*v* diagram or, alternatively, navigate along the unique-hue lines. Our main result is that the navigation method affects the reliability of these achromatic settings. Observers are able to make more reliable achromatic settings when adjusting the test patch along the directions defined by the four unique hues as opposed to navigating along the main axes in the commonly used CIE u*v* chromaticity plane. This result holds across different ambient viewing conditions (Dark, Daylight, Cool White Fluorescent) and different test luminance levels (5, 20, and 50 cd/m2). The reduced variability in the achromatic settings is consistent with the idea that internal color representations are more aligned with the unique-hue lines than the u* and v* axes. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ CPX2014 |
Serial |
2418 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Gemma Sanchez |

|
|
Title |
Graph Matching vs. Graph Parsing in Graphics Recognition: A Combined Approach |
Type |
Journal |
|
Year |
2004 |
Publication |
International Journal of Pattern Recognition and Artificial Intelligence |
Abbreviated Journal |
IJPRAI |
|
|
Volume |
18 |
Issue  |
3 |
Pages |
455–473 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; IF: 0.588 |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LlS2004 |
Serial |
445 |
|
Permanent link to this record |
|
|
|
|
Author |
Antonio Lopez; Ernest Valveny; Juan J. Villanueva |

|
|
Title |
Real-time quality control of surgical material packaging by artificial vision |
Type |
Journal Article |
|
Year |
2005 |
Publication |
Assembly Automation |
Abbreviated Journal |
|
|
|
Volume |
25 |
Issue  |
3 |
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
IF: 0.061) |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS;DAG |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ LVV2005 |
Serial |
552 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; Josep Llados; Gemma Sanchez |

|
|
Title |
Symbol Spotting in Vectorized Technical Drawings Through a Lookup Table of Region Strings |
Type |
Journal Article |
|
Year |
2010 |
Publication |
Pattern Analysis and Applications |
Abbreviated Journal |
PAA |
|
|
Volume |
13 |
Issue  |
3 |
Pages |
321-331 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we address the problem of symbol spotting in technical document images applied to scanned and vectorized line drawings. Like any information spotting architecture, our approach has two components. First, symbols are decomposed in primitives which are compactly represented and second a primitive indexing structure aims to efficiently retrieve similar primitives. Primitives are encoded in terms of attributed strings representing closed regions. Similar strings are clustered in a lookup table so that the set median strings act as indexing keys. A voting scheme formulates hypothesis in certain locations of the line drawing image where there is a high presence of regions similar to the queried ones, and therefore, a high probability to find the queried graphical symbol. The proposed approach is illustrated in a framework consisting in spotting furniture symbols in architectural drawings. It has been proved to work even in the presence of noise and distortion introduced by the scanning and raster-to-vector processes. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer-Verlag |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1433-7541 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RLS2010 |
Serial |
1165 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; Agnes Borras; Josep Llados |

|
|
Title |
Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images |
Type |
Journal Article |
|
Year |
2010 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
31 |
Issue  |
3 |
Pages |
188–201 |
|
|
Keywords |
Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings |
|
|
Abstract |
This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Elsevier |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RBL2010 |
Serial |
1177 |
|
Permanent link to this record |
|
|
|
|
Author |
Alicia Fornes; Josep Llados; Gemma Sanchez; Dimosthenis Karatzas |

|
|
Title |
Rotation Invariant Hand-Drawn Symbol Recognition based on a Dynamic Time Warping Model |
Type |
Journal Article |
|
Year |
2010 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
|
|
Volume |
13 |
Issue  |
3 |
Pages |
229–241 |
|
|
Keywords |
|
|
|
Abstract |
One of the major difficulties of handwriting symbol recognition is the high variability among symbols because of the different writer styles. In this paper, we introduce a robust approach for describing and recognizing hand-drawn symbols tolerant to these writer style differences. This method, which is invariant to scale and rotation, is based on the dynamic time warping (DTW) algorithm. The symbols are described by vector sequences, a variation of the DTW distance is used for computing the matching distance, and K-Nearest Neighbor is used to classify them. Our approach has been evaluated in two benchmarking scenarios consisting of hand-drawn symbols. Compared with state-of-the-art methods for symbol recognition, our method shows higher tolerance to the irregular deformations induced by hand-drawn strokes. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer-Verlag |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1433-2833 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; IF 2009: 1,213 |
Approved |
no |
|
|
Call Number |
DAG @ dag @ FLS2010a |
Serial |
1288 |
|
Permanent link to this record |
|
|
|
|
Author |
Mathieu Nicolas Delalandre; Ernest Valveny; Tony Pridmore; Dimosthenis Karatzas |

|
|
Title |
Generation of Synthetic Documents for Performance Evaluation of Symbol Recognition & Spotting Systems |
Type |
Journal Article |
|
Year |
2010 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
|
|
Volume |
13 |
Issue  |
3 |
Pages |
187-207 |
|
|
Keywords |
|
|
|
Abstract |
This paper deals with the topic of performance evaluation of symbol recognition & spotting systems. We propose here a new approach to the generation of synthetic graphics documents containing non-isolated symbols in a real context. This approach is based on the definition of a set of constraints that permit us to place the symbols on a pre-defined background according to the properties of a particular domain (architecture, electronics, engineering, etc.). In this way, we can obtain a large amount of images resembling real documents by simply defining the set of constraints and providing a few pre-defined backgrounds. As documents are synthetically generated, the groundtruth (the location and the label of every symbol) becomes automatically available. We have applied this approach to the generation of a large database of architectural drawings and electronic diagrams, which shows the flexibility of the system. Performance evaluation experiments of a symbol localization system show that our approach permits to generate documents with different features that are reflected in variation of localization results. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer-Verlag |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1433-2833 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ DVP2010 |
Serial |
1289 |
|
Permanent link to this record |