|
T.O. Nguyen, Salvatore Tabbone, Oriol Ramos Terrades and A.T. Thierry. 2008. Proposition d'un descripteur de formes et du modèle vectoriel pour la recherche de symboles. Colloque International Francophone sur l'Ecrit et le Document.79–84.
|
|
|
Anjan Dutta, Josep Llados, Horst Bunke and Umapada Pal. 2018. Product graph-based higher order contextual similarities for inexact subgraph matching. PR, 76, 596–611.
Abstract: Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem.
|
|
|
Francisco Cruz. 2016. Probabilistic Graphical Models for Document Analysis. (Ph.D. thesis, Ediciones Graficas Rey.)
Abstract: Latest advances in digitization techniques have fostered the interest in creating digital copies of collections of documents. Digitized documents permit an easy maintenance, loss-less storage, and efficient ways for transmission and to perform information retrieval processes. This situation has opened a new market niche to develop systems able to automatically extract and analyze information contained in these collections, specially in the ambit of the business activity.
Due to the great variety of types of documents this is not a trivial task. For instance, the automatic extraction of numerical data from invoices differs substantially from a task of text recognition in historical documents. However, in order to extract the information of interest, is always necessary to identify the area of the document where it is located. In the area of Document Analysis we refer to this process as layout analysis, which aims at identifying and categorizing the different entities that compose the document, such as text regions, pictures, text lines, or tables, among others. To perform this task it is usually necessary to incorporate a prior knowledge about the task into the analysis process, which can be modeled by defining a set of contextual relations between the different entities of the document. The use of context has proven to be useful to reinforce the recognition process and improve the results on many computer vision tasks. It presents two fundamental questions: What kind of contextual information is appropriate for a given task, and how to incorporate this information into the models.
In this thesis we study several ways to incorporate contextual information to the task of document layout analysis, and to the particular case of handwritten text line segmentation. We focus on the study of Probabilistic Graphical Models and other mechanisms for this purpose, and propose several solutions to these problems. First, we present a method for layout analysis based on Conditional Random Fields. With this model we encode local contextual relations between variables, such as pair-wise constraints. Besides, we encode a set of structural relations between different classes of regions at feature level. Second, we present a method based on 2D-Probabilistic Context-free Grammars to encode structural and hierarchical relations. We perform a comparative study between Probabilistic Graphical Models and this syntactic approach. Third, we propose a method for structured documents based on Bayesian Networks to represent the document structure, and an algorithm based in the Expectation-Maximization to find the best configuration of the page. We perform a thorough evaluation of the proposed methods on two particular collections of documents: a historical collection composed of ancient structured documents, and a collection of contemporary documents. In addition, we present a general method for the task of handwritten text line segmentation. We define a probabilistic framework where we combine the EM algorithm with variational approaches for computing inference and parameter learning on a Markov Random Field. We evaluate our method on several collections of documents, including a general dataset of annotated administrative documents. Results demonstrate the applicability of our method to real problems, and the contribution of the use of contextual information to this kind of problems.
|
|
|
Ruben Tito and 10 others. 2023. Privacy-Aware Document Visual Question Answering.
Abstract: Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
|
|
|
Alfons Juan-Ciscar and Gemma Sanchez. 2008. PRIS 2008. Pattern Recognition in Information Systems. Proceedings of the 8th international Workshop on Pattern Recognition in Information systems – PRIS 2008, in conjunction with ICEIS 2008.
|
|
|
Alicia Fornes, Josep Llados and Gemma Sanchez. 2005. Primitive Segmentation in Old Handwritten Music Scores.
|
|
|
Alicia Fornes, Josep Llados and Gemma Sanchez. 2006. Primitive Segmentation in Old Handwritten Music Scores. Graphics Recognition: Ten Years Review and Future Perspectives, W. Liu, J. Llados (Eds.), LNCS 3926: 288–299.
|
|
|
Josep Llados and Enric Marti. 1997. Playing with error-tolerant subgraph isomorphism in line drawings. VII National Symposium on Pattern Recognition and image Analysis.
|
|
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang and Josep Llados. 2013. Plausibility-Graphs for Symbol Spotting in Graphical Documents. 10th IAPR International Workshop on Graphics Recognition.
Abstract: Graph representation of graphical documents often suffers from noise viz. spurious nodes and spurios edges of graph and their discontinuity etc. In general these errors occur during the low-level image processing viz. binarization, skeletonization, vectorization etc. Hierarchical graph representation is a nice and efficient way to solve this kind of problem by hierarchically merging node-node and node-edge depending on the distance.
But the creation of hierarchical graph representing the graphical information often uses hard thresholds on the distance to create the hierarchical nodes (next state) of the lower nodes (or states) of a graph. As a result the representation often loses useful information. This paper introduces plausibilities to the nodes of hierarchical graph as a function of distance and proposes a modified algorithm for matching subgraphs of the hierarchical
graphs. The plausibility-annotated nodes help to improve the performance of the matching algorithm on two hierarchical structures. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
|
|
Josep Llados. 2006. Perspectives on the Analysis of Graphical Documents.
|
|