|
Pau Riba, Anjan Dutta, Josep Llados and Alicia Fornes. 2017. Graph-based deep learning for graphics classification. 14th International Conference on Document Analysis and Recognition.29–30.
Abstract: Graph-based representations are a common way to deal with graphics recognition problems. However, previous works were mainly focused on developing learning-free techniques. The success of deep learning frameworks have proved that learning is a powerful tool to solve many problems, however it is not straightforward to extend these methodologies to non euclidean data such as graphs. On the other hand, graphs are a good representational structure for graphical entities. In this work, we present some deep learning techniques that have been proposed in the literature for graph-based representations and
we show how they can be used in graphics recognition problems
|
|
|
Raul Gomez and 7 others. 2017. ICDAR2017 Robust Reading Challenge on COCO-Text. 14th International Conference on Document Analysis and Recognition.
|
|
|
Masakazu Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, Lluis Gomez and Dimosthenis Karatzas. 2017. ICDAR2017 Robust Reading Challenge on Omnidirectional Video. 14th International Conference on Document Analysis and Recognition.
Abstract: Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community.
|
|
|
Suman Ghosh and Ernest Valveny. 2017. R-PHOC: Segmentation-Free Word Spotting using CNN. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1707.01294
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.
Keywords: Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search
|
|
|
Suman Ghosh and Ernest Valveny. 2017. Visual attention models for scene text recognition. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1706.01487
In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
|
|
|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados and Cristina Cañero. 2017. Evaluation of Texture Descriptors for Validation of Counterfeit Documents. 14th International Conference on Document Analysis and Recognition.1237–1242.
Abstract: This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.
|
|
|
ChunYang, Xu Cheng Yin, Hong Yu, Dimosthenis Karatzas and Yu Cao. 2017. ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT). 14th International Conference on Document Analysis and Recognition.1444–1447.
Abstract: Hundreds of millions of figures are available in the biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information and understanding biomedical documents. Unlike images in the open domain, biomedical figures present a variety of unique challenges. For example, biomedical figures typically have complex layouts, small font sizes, short text, specific text, complex symbols and irregular text arrangements. This paper presents the final results of the ICDAR 2017 Competition on Text Extraction from Biomedical Literature Figures (ICDAR2017 DeTEXT Competition), which aims at extracting (detecting and recognizing) text from biomedical literature figures. Similar to text extraction from scene images and web pictures, ICDAR2017 DeTEXT Competition includes three major tasks, i.e., text detection, cropped word recognition and end-to-end text recognition. Here, we describe in detail the data set, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods.
|
|
|
David Fernandez, Pau Riba, Alicia Fornes and Josep Llados. 2014. On the Influence of Key Point Encoding for Handwritten Word Spotting. 14th International Conference on Frontiers in Handwriting Recognition.476–481.
Abstract: In this paper we evaluate the influence of the selection of key points and the associated features in the performance of word spotting processes. In general, features can be extracted from a number of characteristic points like corners, contours, skeletons, maxima, minima, crossings, etc. A number of descriptors exist in the literature using different interest point detectors. But the intrinsic variability of handwriting vary strongly on the performance if the interest points are not stable enough. In this paper, we analyze the performance of different descriptors for local interest points. As benchmarking dataset we have used the Barcelona Marriage Database that contains handwritten records of marriages over five centuries.
Keywords: Local descriptors; Interest points; Handwritten documents; Word spotting; Historical document analysis
|
|
|
Pau Riba, Jon Almazan, Alicia Fornes, David Fernandez, Ernest Valveny and Josep Llados. 2014. e-Crowds: a mobile platform for browsing and searching in historical demographyrelated manuscripts. 14th International Conference on Frontiers in Handwriting Recognition.228–233.
Abstract: This paper presents a prototype system running on portable devices for browsing and word searching through historical handwritten document collections. The platform adapts the paradigm of eBook reading, where the narrative is not necessarily sequential, but centered on the user actions. The novelty is to replace digitally born books by digitized historical manuscripts of marriage licenses, so document analysis tasks are required in the browser. With an active reading paradigm, the user can cast queries of people names, so he/she can implicitly follow genealogical links. In addition, the system allows combined searches: the user can refine a search by adding more words to search. As a second contribution, the retrieval functionality involves as a core technology a word spotting module with an unified approach, which allows combined query searches, and also two input modalities: query-by-example, and query-by-string.
|
|
|
Robert Benavente, Gemma Sanchez, Ramon Baldrich, Maria Vanrell and Josep Llados. 2000. Normalized colour segmentation for human appearance description. 15 th International Conference on Pattern Recognition.637–641.
|
|