|
Francesco Brughi, Debora Gil, Llorenç Badiella, Eva Jove Casabella and Oriol Ramos Terrades. 2014. Exploring the impact of inter-query variability on the performance of retrieval systems. 11th International Conference on Image Analysis and Recognition. Springer International Publishing, 413–420. (LNCS.)
Abstract: This paper introduces a framework for evaluating the performance of information retrieval systems. Current evaluation metrics provide an average score that does not consider performance variability across the query set. In this manner, conclusions lack of any statistical significance, yielding poor inference to cases outside the query set and possibly unfair comparisons. We propose to apply statistical methods in order to obtain a more informative measure for problems in which different query classes can be identified. In this context, we assess the performance variability on two levels: overall variability across the whole query set and specific query class-related variability. To this end, we estimate confidence bands for precision-recall curves, and we apply ANOVA in order to assess the significance of the performance across different query classes.
|
|
|
Manuel Carbonell, Mauricio Villegas, Alicia Fornes and Josep Llados. 2018. Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model. 13th IAPR International Workshop on Document Analysis Systems.399–404.
Abstract: When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
Keywords: Named entity recognition; Handwritten Text Recognition; neural networks
|
|
|
Y. Patel, Lluis Gomez, Marçal Rusiñol and Dimosthenis Karatzas. 2016. Dynamic Lexicon Generation for Natural Scene Images. 14th European Conference on Computer Vision Workshops.395–410.
Abstract: Many scene text understanding methods approach the endtoend recognition problem from a word-spotting perspective and take huge benet from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed.
In this paper we propose a method that generates contextualized lexicons
for scene images using only visual information. For this, we exploit
the correlation between visual and textual information in a dataset consisting
of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a xed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline.
Keywords: scene text; photo OCR; scene understanding; lexicon generation; topic modeling; CNN
|
|
|
Pau Riba, Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2021. Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting. 16th International Conference on Document Analysis and Recognition.381–395.
Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
|
|
|
Albert Gordo, Jaume Gibert, Ernest Valveny and Marçal Rusiñol. 2010. A Kernel-based Approach to Document Retrieval. 9th IAPR International Workshop on Document Analysis Systems.377–384.
Abstract: In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain class. The membership probability to a specific class is computed using Support Vector Machines in conjunction with similarity measure based kernel applied to structural document representations. In the presented experiments, we use different document representations, both visual and structural, and we apply them to a database of historical documents. We show how our method based on similarity kernels outperforms the usual distance-based retrieval.
|
|
|
J.Kuhn and 10 others. 2015. Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. Workshop Proceedings on the 11th International Conference on Intelligent Environments.373–380.
Abstract: Translating conceptual knowledge into real world experiences presents a significant educational challenge. This position paper presents an approach that supports learners in moving seamlessly between conceptual learning and their application in the real world by bringing physical and virtual experiments into everyday settings. Learners are empowered in conducting these situated experiments in a variety of physical settings by leveraging state of the art mobile, augmented reality, and virtual reality technology. A blend of mobile-based multi-sensory physical experiments, augmented reality and enabling virtual environments can allow learners to bridge their conceptual learning with tangible experiences in a completely novel manner. This approach focuses on the learner by applying self-regulated personalised learning techniques, underpinned by innovative pedagogical approaches and adaptation techniques, to ensure that the needs and preferences of each learner are catered for individually.
|
|
|
Dimosthenis Karatzas, V. Poulain d'Andecy and Marçal Rusiñol. 2016. Human-Document Interaction – a new frontier for document image analysis. 12th IAPR Workshop on Document Analysis Systems.369–374.
Abstract: All indications show that paper documents will not cede in favour of their digital counterparts, but will instead be used increasingly in conjunction with digital information. An open challenge is how to seamlessly link the physical with the digital – how to continue taking advantage of the important affordances of paper, without missing out on digital functionality. This paper
presents the authors’ experience with developing systems for Human-Document Interaction based on augmented document interfaces and examines new challenges and opportunities arising for the document image analysis field in this area. The system presented combines state of the art camera-based document
image analysis techniques with a range of complementary tech-nologies to offer fluid Human-Document Interaction. Both fixed and nomadic setups are discussed that have gone through user testing in real-life environments, and use cases are presented that span the spectrum from business to educational application
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados and Felipe Lumbreras. 2016. Information Theoretic Rotationwise Robust Binary Descriptor Learning. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR).368–378.
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Juan Ignacio Toledo, Alicia Fornes, Jordi Cucurull and Josep Llados. 2016. Election Tally Sheets Processing System. 12th IAPR Workshop on Document Analysis Systems.364–368.
Abstract: In paper based elections, manual tallies at polling station level produce myriads of documents. These documents share a common form-like structure and a reduced vocabulary worldwide. On the other hand, each tally sheet is filled by a different writer and on different countries, different scripts are used. We present a complete document analysis system for electoral tally sheet processing combining state of the art techniques with a new handwriting recognition subprocess based on unsupervised feature discovery with Variational Autoencoders and sequence classification with BLSTM neural networks. The whole system is designed to be script independent and allows a fast and reliable results consolidation process with reduced operational cost.
|
|
|
Josep Llados and Gemma Sanchez. 2007. Indexing Historical Documents by Word Shape Signatures. 9th International Conference on Document Analysis and Recognition.362–366.
|
|