|
Gemma Sanchez, Ernest Valveny, Josep Llados, Joan Mas and N. Lozano. 2004. A platform to extract knowledge from graphic documents. Application to an architectural sketch understanding scenario.
|
|
|
Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2022. A Generic Image Retrieval Method for Date Estimation of Historical Document Collections. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).583–597.
Abstract: Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
Keywords: Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG
|
|
|
Josep Brugues Pujolras, Lluis Gomez and Dimosthenis Karatzas. 2022. A Multilingual Approach to Scene Text Visual Question Answering. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).65–79.
Abstract: Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.
Keywords: Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2014. Fast Structural Matching for Document Image Retrieval through Spatial Databases. Document Recognition and Retrieval XXI.
Abstract: The structure of document images plays a signicant role in document analysis thus considerable eorts have been made towards extracting and understanding document structure, usually in the form of layout analysis approaches. In this paper, we rst employ Distance Transform based MSER (DTMSER) to eciently extract stable document structural elements in terms of a dendrogram of key-regions. Then a fast structural matching method is proposed to query the structure of document (dendrogram) based on a spatial database which facilitates the formulation of advanced spatial queries. The experiments demonstrate a signicant improvement in a document retrieval scenario when compared to the use of typical Bag of Words (BoW) and pyramidal BoW descriptors.
Keywords: Document image retrieval; distance transform; MSER; spatial database
|
|
|
Marçal Rusiñol, R.Roset, Josep Llados and C.Montaner. 2011. Automatic Index Generation of Digitized Map Series by Coordinate Extraction and Interpretation.
Abstract: By means of computer vision algorithms scanned images of maps are processed in order to extract relevant geographic information from printed coordinate pairs. The meaningful information is then transformed into georeferencing information for each single map sheet, and the complete set is compiled to produce a graphical index sheet for the map series along with relevant metadata. The whole process is fully automated and trained to attain maximum effectivity and throughput.
|
|
|
Juan Ignacio Toledo, Jordi Cucurull, Jordi Puiggali, Alicia Fornes and Josep Llados. 2015. Document Analysis Techniques for Automatic Electoral Document Processing: A Survey. E-Voting and Identity, Proceedings of 5th international conference, VoteID 2015.139–141. (LNCS.)
Abstract: In this paper, we will discuss the most common challenges in electoral document processing and study the different solutions from the document analysis community that can be applied in each case. We will cover Optical Mark Recognition techniques to detect voter selections in the Australian Ballot, handwritten number recognition for preferential elections and handwriting recognition for write-in areas. We will also propose some particular adjustments that can be made to those general techniques in the specific context of electoral documents.
Keywords: Document image analysis; Computer vision; Paper ballots; Paper based elections; Optical scan; Tally
|
|
|
Ali Furkan Biten, Ruben Tito, Lluis Gomez, Ernest Valveny and Dimosthenis Karatzas. 2022. OCR-IDL: OCR Annotations for Industry Document Library Dataset. ECCV Workshop on Text in Everything.
Abstract: Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL.
|
|
|
Ernest Valveny, Robert Benavente, Agata Lapedriza, Miquel Ferrer, Jaume Garcia and Gemma Sanchez. 2012. Adaptation of a computer programming course to the EXHE requirements: evaluation five years later.
|
|
|
Debora Gil, Oriol Ramos Terrades and Raquel Perez. 2021. Topological Radiomics (TOPiomics): Early Detection of Genetic Abnormalities in Cancer Treatment Evolution. Extended Abstracts GEOMVAP 2019, Trends in Mathematics 15. Springer Nature, 89–93.
Abstract: Abnormalities in radiomic measures correlate to genomic alterations prone to alter the outcome of personalized anti-cancer treatments. TOPiomics is a new method for the early detection of variations in tumor imaging phenotype from a topological structure in multi-view radiomic spaces.
|
|
|
Gemma Sanchez, Josep Llados and K. Tombre. 2001. An Error-Correction Graph Grammar to Recognize Textured Symbols..
|
|