Records |
Links |
Author |
Mickael Coustaty; Alicia Fornes |
Title |
Document Analysis and Recognition – ICDAR 2023 Workshops |
Type |
Book Whole |
Year |
2023 |
Publication |
Document Analysis and Recognition – ICDAR 2023 Workshops |
Abbreviated Journal |
Volume |
14194 |
Issue |
2 |
Pages |
Keywords |
Abstract |
Address |
San Jose; USA; August 2023 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ CoF2023 |
Serial |
3852 |
Permanent link to this record |
Author |
Juan Ignacio Toledo; Jordi Cucurull; Jordi Puiggali; Alicia Fornes; Josep Llados |
Title |
Document Analysis Techniques for Automatic Electoral Document Processing: A Survey |
Type |
Conference Article |
Year |
2015 |
Publication |
E-Voting and Identity, Proceedings of 5th international conference, VoteID 2015 |
Abbreviated Journal |
Volume |
Issue |
Pages |
139-141 |
Keywords |
Document image analysis; Computer vision; Paper ballots; Paper based elections; Optical scan; Tally |
Abstract |
In this paper, we will discuss the most common challenges in electoral document processing and study the different solutions from the document analysis community that can be applied in each case. We will cover Optical Mark Recognition techniques to detect voter selections in the Australian Ballot, handwritten number recognition for preferential elections and handwriting recognition for write-in areas. We will also propose some particular adjustments that can be made to those general techniques in the specific context of electoral documents. |
Address |
Bern; Switzerland; September 2015 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
VoteID |
Notes |
DAG; 600.061; 602.006; 600.077 |
Approved |
no |
Call Number |
Admin @ si @ TCP2015 |
Serial |
2641 |
Permanent link to this record |
Author |
Albert Gordo; Marçal Rusiñol; Dimosthenis Karatzas; Andrew Bagdanov |
Title |
Document Classification and Page Stream Segmentation for Digital Mailroom Applications |
Type |
Conference Article |
Year |
2013 |
Publication |
12th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
621-625 |
Keywords |
Abstract |
In this paper we present a method for the segmentation of continuous page streams into multipage documents and the simultaneous classification of the resulting documents. We first present an approach to combine the multiple pages of a document into a single feature vector that represents the whole document. Despite its simplicity and low computational cost, the proposed representation yields results comparable to more complex methods in multipage document classification tasks. We then exploit this representation in the context of page stream segmentation. The most plausible segmentation of a page stream into a sequence of multipage documents is obtained by optimizing a statistical model that represents the probability of each segmented multipage document belonging to a particular class. Experimental results are reported on a large sample of real administrative multipage documents. |
Address |
Washington; USA; August 2013 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
1520-5363 |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.056; 602.101 |
Approved |
no |
Call Number |
Admin @ si @ GRK2013c |
Serial |
2345 |
Permanent link to this record |
Author |
Albert Gordo; Florent Perronnin; Ernest Valveny |
Title |
Document classification using multiple views |
Type |
Conference Article |
Year |
2012 |
Publication |
10th IAPR International Workshop on Document Analysis Systems |
Abbreviated Journal |
Volume |
Issue |
Pages |
33-37 |
Keywords |
Abstract |
The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at training and test time. However, some views may be too `expensive' to be available at test time. In this paper, we consider the use of Canonical Correlation Analysis to leverage `expensive' views that are available only at training time. Experimental results show that this information may significantly improve the results in a classification task. |
Address |
Australia |
Corporate Author |
Thesis |
Publisher |
IEEE Computer Society Washington |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-0-7695-4661-2 |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ GPV2012 |
Serial |
2049 |
Permanent link to this record |
Author |
Ruben Tito; Dimosthenis Karatzas; Ernest Valveny |
Title |
Document Collection Visual Question Answering |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
Volume |
12822 |
Issue |
Pages |
778-792 |
Keywords |
Document collection; Visual Question Answering |
Abstract |
Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ TKV2021 |
Serial |
3622 |
Permanent link to this record |
Author |
Mohamed Ali Souibgui |
Title |
Document Image Enhancement and Recognition in Low Resource Scenarios: Application to Ciphers and Handwritten Text |
Type |
Book Whole |
Year |
2022 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
In this thesis, we propose different contributions with the goal of enhancing and recognizing historical handwritten document images, especially the ones with rare scripts, such as cipher documents.
In the first part, some effective end-to-end models for Document Image Enhancement (DIE) using deep learning models were presented. First, Generative Adversarial Networks (cGAN) for different tasks (document clean-up, binarization, deblurring, and watermark removal) were explored. Next, we further improve the results by recovering the degraded document images into a clean and readable form by integrating a text recognizer into the cGAN model to promote the generated document image to be more readable. Afterward, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion.
The second part of the thesis addresses Handwritten Text Recognition (HTR) in low resource scenarios, i.e. when only few labeled training data is available. We propose novel methods for recognizing ciphers with rare scripts. First, a few-shot object detection based method was proposed. Then, we incorporate a progressive learning strategy that automatically assignspseudo-labels to a set of unlabeled data to reduce the human labor of annotating few pages while maintaining the good performance of the model. Secondly, a data generation technique based on Bayesian Program Learning (BPL) is proposed to overcome the lack of data in such rare scripts. Thirdly, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE). This latter self-supervised model is designed to tackle two tasks, text recognition and document image enhancement. The proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time, it requires substantially fewer data samples to converge.
In the third part of the thesis, we analyze, from the user perspective, the usage of HTR systems in low resource scenarios. This contrasts with the usual research on HTR, which often focuses on technical aspects only and rarely devotes efforts on implementing software tools for scholars in Humanities. |
Address |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Place of Publication |
Editor |
Alicia Fornes;Yousri Kessentini |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-84-124793-8-6 |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ Sou2022 |
Serial |
3757 |
Permanent link to this record |
Author |
Albert Gordo |
Title |
Document Image Representation, Classification and Retrieval in Large-Scale Domains |
Type |
Book Whole |
Year |
2013 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Despite the “paperless office” ideal that started in the decade of the seventies, businesses still strive against an increasing amount of paper documentation. Companies still receive huge amounts of paper documentation that need to be analyzed and processed, mostly in a manual way. A solution for this task consists in, first, automatically scanning the incoming documents. Then, document images can be analyzed and information can be extracted from the data. Documents can also be automatically dispatched to the appropriate workflows, used to retrieve similar documents in the dataset to transfer information, etc.
Due to the nature of this “digital mailroom”, we need document representation methods to be general, i.e., able to cope with very different types of documents. We need the methods to be sound, i.e., able to cope with unexpected types of documents, noise, etc. And, we need to methods to be scalable, i.e., able to cope with thousands or millions of documents that need to be processed, stored, and consulted. Unfortunately, current techniques of document representation, classification and retrieval are not apt for this digital mailroom framework, since they do not fulfill some or all of these requirements.
Through this thesis we focus on the problem of document representation aimed at classification and retrieval tasks under this digital mailroom framework. We first propose a novel document representation based on runlength histograms, and extend it to cope with more complex documents such as multiple-page documents, or documents that contain more sources of information such as extracted OCR text. Then we focus on the scalability requirements and propose a novel binarization method which we dubbed PCAE, as well as two general asymmetric distances between binary embeddings that can significantly improve the retrieval results at a minimal extra computational cost. Finally, we note the importance of supervised learning when performing large-scale retrieval, and study several approaches that can significantly boost the results at no extra cost at query time. |
Address |
Barcelona |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Ediciones Graficas Rey |
Place of Publication |
Editor |
Ernest Valveny;Florent Perronnin |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ Gor2013 |
Serial |
2277 |
Permanent link to this record |
Author |
Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades |
Title |
Document noise removal using sparse representations over learned dictionary |
Type |
Conference Article |
Year |
2013 |
Publication |
Symposium on Document engineering |
Abbreviated Journal |
Volume |
Issue |
Pages |
161-168 |
Keywords |
Abstract |
best paper award
In this paper, we propose an algorithm for denoising document images using sparse representations. Following a training set, this algorithm is able to learn the main document characteristics and also, the kind of noise included into the documents. In this perspective, we propose to model the noise energy based on the normalized cross-correlation between pairs of noisy and non-noisy documents. Experimental
results on several datasets demonstrate the robustness of our method compared with the state-of-the-art. |
Address |
Barcelona; October 2013 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-1-4503-1789-4 |
Medium |
Area |
Expedition |
Conference |
ACM-DocEng |
Notes |
DAG; 600.061 |
Approved |
no |
Call Number |
Admin @ si @ DTR2013a |
Serial |
2330 |
Permanent link to this record |
Author |
Partha Pratim Roy; Umapada Pal; Josep Llados |
Title |
Document Seal Detection Using Ght and Character Proximity Graphs |
Type |
Journal Article |
Year |
2011 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
Volume |
44 |
Issue |
6 |
Pages |
1282-1295 |
Keywords |
Seal recognition; Graphical symbol spotting; Generalized Hough transform; Multi-oriented character recognition |
Abstract |
This paper deals with automatic detection of seal (stamp) from documents with cluttered background. Seal detection involves a difficult challenge due to its multi-oriented nature, arbitrary shape, overlapping of its part with signature, noise, etc. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors computed from recognition result of individual connected components (characters). Scale and rotation invariant features are used in a Support Vector Machine (SVM) classifier to recognize multi-scale and multi-oriented text characters. The concept of generalized Hough transform (GHT) is used to detect the seal and a voting scheme is designed for finding possible location of the seal in a document based on the spatial feature descriptor of neighboring component pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal in a document. Experiment is performed in an archive of historical documents of handwritten/printed English text. Experimental results show that the method is robust in locating seal instances of arbitrary shape and orientation in documents, and also efficient in indexing a collection of documents for retrieval purposes. |
Address |
Corporate Author |
Thesis |
Publisher |
Elsevier |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ RPL2011 |
Serial |
1820 |
Permanent link to this record |
Author |
Francisco Cruz; Oriol Ramos Terrades |
Title |
Document segmentation using relative location features |
Type |
Conference Article |
Year |
2012 |
Publication |
21st International Conference on Pattern Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
1562-1565 |
Keywords |
Abstract |
In this paper we evaluate the use of Relative Location Features (RLF) on a historical document segmentation task, and compare the quality of the results obtained on structured and unstructured documents using RLF and not using them. We prove that using these features improve the final segmentation on documents with a strong structure, while their application on unstructured documents does not show significant improvement. Although this paper is not focused on segmenting unstructured documents, results obtained on a benchmark dataset are equal or even overcome previous results of similar works. |
Address |
Tsukuba Science City, Japan |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
Approved |
no |
Call Number |
Admin @ si @ CrR2012 |
Serial |
2051 |
Permanent link to this record |