|
Ruben Tito and 10 others. 2023. Privacy-Aware Document Visual Question Answering.
Abstract: Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
|
|
|
Beata Megyesi, Alicia Fornes, Nils Kopal and Benedek Lang. 2024. Historical Cryptology. Learning and Experiencing Cryptography with CrypTool and SageMath.
Abstract: Historical cryptology studies (original) encrypted manuscripts, often handwritten sources, produced in our history. These historical sources can be found in archives, often hidden without any indexing and therefore hard to locate. Once found they need to be digitized and turned into a machine-readable text format before they can be deciphered with computational methods. The focus of historical cryptology is not primarily the development of sophisticated algorithms for decipherment, but rather the entire process of analysis of the encrypted source from collection and digitization to transcription and decryption. The process also includes the interpretation and contextualization of the message set in its historical context. There are many challenges on the way, such as mistakes made by the scribe, errors made by the transcriber, damaged pages, handwriting styles that are difficult to interpret, historical languages from various time periods, and hidden underlying language of the message. Ciphertexts vary greatly in terms of their code system and symbol sets used with more or less distinguishable symbols. Ciphertexts can be embedded in clearly written text, or shorter or longer sequences of cleartext can be embedded in the ciphertext. The ciphers used mostly in historical times are substitutions (simple, homophonic, or polyphonic), with or without nomenclatures, encoded as digits or symbol sequences, with or without spaces. So the circumstances are different from those in modern cryptography which focuses on methods (algorithms) and their strengths and assumes that the algorithm is applied correctly. For both historical and modern cryptology, attack vectors outside the algorithm are applied like implementation flaws and side-channel attacks. In this chapter, we give an introduction to the field of historical cryptology and present an overview of how researchers today process historical encrypted sources.
|
|
|
Ayan Banerjee, Sanket Biswas, Josep Llados and Umapada Pal. 2024. GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation.
Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: this https URL.
|
|
|
Ernest Valveny and Enric Marti. 2000. Hand-drawn symbol recognition in graphic documents using deformable template matching and a Bayesian framework. Proc. 15th Int Pattern Recognition Conf.239–242.
Abstract: Hand-drawn symbols can take many different and distorted shapes from their ideal representation. Then, very flexible methods are needed to be able to handle unconstrained drawings. We propose here to extend our previous work in hand-drawn symbol recognition based on a Bayesian framework and deformable template matching. This approach gets flexibility enough to fit distorted shapes in the drawing while keeping fidelity to the ideal shape of the symbol. In this work, we define the similarity measure between an image and a symbol based on the distance from every pixel in the image to the lines in the symbol. Matching is carried out using an implementation of the EM algorithm. Thus, we can improve recognition rates and computation time with respect to our previous formulation based on a simulated annealing algorithm.
|
|
|
Jon Almazan, Albert Gordo, Alicia Fornes and Ernest Valveny. 2012. Efficient Exemplar Word Spotting. 23rd British Machine Vision Conference.67.1–67.11.
Abstract: In this paper we propose an unsupervised segmentation-free method for word spotting in document images.
Documents are represented with a grid of HOG descriptors, and a sliding window approach is used to locate the document regions that are most similar to the query. We use the exemplar SVM framework to produce a better representation of the query in an unsupervised way. Finally, the document descriptors are precomputed and compressed with Product Quantization. This offers two advantages: first, a large number of documents can be kept in RAM memory at the same time. Second, the sliding window becomes significantly faster since distances between quantized HOG descriptors can be precomputed. Our results significantly outperform other segmentation-free methods in the literature, both in accuracy and in speed and memory usage.
|
|
|
Ernest Valveny and Philippe Dosch. 2004. Performance Evaluation of Symbol Recognition. In S. Marinai, A.D.(E.),, ed. Document Analysis Systems.354–365.
|
|
|
Josep Llados, Ernest Valveny, Gemma Sanchez and Enric Marti. 2002. Symbol recognition: current advances and perspectives. In Dorothea Blostein and Young- Bin Kwon, ed. Graphics Recognition Algorithms And Applications. Springer-Verlag, 104–128. (LNCS.)
Abstract: The recognition of symbols in graphic documents is an intensive research activity in the community of pattern recognition and document analysis. A key issue in the interpretation of maps, engineering drawings, diagrams, etc. is the recognition of domain dependent symbols according to a symbol database. In this work we first review the most outstanding symbol recognition methods from two different points of view: application domains and pattern recognition methods. In the second part of the paper, open and unaddressed problems involved in symbol recognition are described, analyzing their current state of art and discussing future research challenges. Thus, issues such as symbol representation, matching, segmentation, learning, scalability of recognition methods and performance evaluation are addressed in this work. Finally, we discuss the perspectives of symbol recognition concerning to new paradigms such as user interfaces in handheld computers or document database and WWW indexing by graphical content.
|
|
|
Josep Llados, Ernest Valveny and Enric Marti. 2000. Symbol Recognition in Document Image Analysis: Methods and Challenges. Recent Research Developments in Pattern Recognition, Transworld Research Network,, 1, 151–178.
|
|
|
Josep Llados. 2006. Computer Vision: Progress of Research and Development.
|
|
|
Josep Llados, Ernest Valveny, Gemma Sanchez and Enric Marti. 2003. A Case Study of Pattern Recognition: Symbol Recognition in Graphic Documentsa. Proceedings of Pattern Recognition in Information Systems. ICEIS Press, 1–13.
|
|