toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
doi  openurl
  Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue Pages 109834  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.155; 600.121 Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3825  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Albert Berenguel; Oriol Ramos Terrades; Josep Llados; Cristina Cañero edit  doi
openurl 
  Title Evaluation of Texture Descriptors for Validation of Counterfeit Documents Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 1237-1242  
  Keywords  
  Abstract This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2379-2140 ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.061; 601.269; 600.097; 600.121 Approved no  
  Call Number Admin @ si @ BRL2017 Serial 3092  
Permanent link to this record
 

 
Author G.Thorvaldsen; Joana Maria Pujadas-Mora; T.Andersen ; L.Eikvil; Josep Llados; Alicia Fornes; Anna Cabre edit  url
openurl 
  Title A Tale of two Transcriptions Type Journal
  Year 2015 Publication Historical Life Course Studies Abbreviated Journal  
  Volume 2 Issue Pages 1-19  
  Keywords Nominative Sources; Census; Vital Records; Computer Vision; Optical Character Recognition; Word Spotting  
  Abstract non-indexed
This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2352-6343 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.077; 602.006 Approved no  
  Call Number Admin @ si @ TPA2015 Serial 2582  
Permanent link to this record
 

 
Author David Fernandez; Pau Riba; Alicia Fornes; Josep Llados edit   pdf
doi  isbn
openurl 
  Title On the Influence of Key Point Encoding for Handwritten Word Spotting Type Conference Article
  Year 2014 Publication 14th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages 476 - 481  
  Keywords Local descriptors; Interest points; Handwritten documents; Word spotting; Historical document analysis  
  Abstract In this paper we evaluate the influence of the selection of key points and the associated features in the performance of word spotting processes. In general, features can be extracted from a number of characteristic points like corners, contours, skeletons, maxima, minima, crossings, etc. A number of descriptors exist in the literature using different interest point detectors. But the intrinsic variability of handwriting vary strongly on the performance if the interest points are not stable enough. In this paper, we analyze the performance of different descriptors for local interest points. As benchmarking dataset we have used the Barcelona Marriage Database that contains handwritten records of marriages over five centuries.  
  Address Creete Island; Grecia; September 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2167-6445 ISBN 978-1-4799-4335-7 Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.056; 600.061; 602.006; 600.077 Approved no  
  Call Number Admin @ si @ FRF2014 Serial 2460  
Permanent link to this record
 

 
Author Pau Riba; Jon Almazan; Alicia Fornes; David Fernandez; Ernest Valveny; Josep Llados edit   pdf
doi  isbn
openurl 
  Title e-Crowds: a mobile platform for browsing and searching in historical demographyrelated manuscripts Type Conference Article
  Year 2014 Publication 14th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages 228 - 233  
  Keywords  
  Abstract This paper presents a prototype system running on portable devices for browsing and word searching through historical handwritten document collections. The platform adapts the paradigm of eBook reading, where the narrative is not necessarily sequential, but centered on the user actions. The novelty is to replace digitally born books by digitized historical manuscripts of marriage licenses, so document analysis tasks are required in the browser. With an active reading paradigm, the user can cast queries of people names, so he/she can implicitly follow genealogical links. In addition, the system allows combined searches: the user can refine a search by adding more words to search. As a second contribution, the retrieval functionality involves as a core technology a word spotting module with an unified approach, which allows combined query searches, and also two input modalities: query-by-example, and query-by-string.  
  Address Creete Island; Grecia; September 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2167-6445 ISBN 978-1-4799-4335-7 Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.056; 600.045; 600.061; 602.006; 600.077 Approved no  
  Call Number Admin @ si @ RAF2014 Serial 2463  
Permanent link to this record
 

 
Author Arnau Baro; Pau Riba; Alicia Fornes edit   pdf
doi  openurl
  Title Towards the recognition of compound music notes in handwritten music scores Type Conference Article
  Year 2016 Publication 15th international conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising.  
  Address Shenzhen; China; October 2016  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2167-6445 ISBN Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.097 Approved no  
  Call Number Admin @ si @ BRF2016 Serial 2903  
Permanent link to this record
 

 
Author Ernest Valveny; Oriol Ramos Terrades; Joan Mas; Marçal Rusiñol edit   pdf
url  doi
isbn  openurl
  Title Interactive Document Retrieval and Classification. Type Book Chapter
  Year 2013 Publication Multimodal Interaction in Image and Video Applications Abbreviated Journal  
  Volume 48 Issue Pages 17-30  
  Keywords  
  Abstract In this chapter we describe a system for document retrieval and classification following the interactive-predictive framework. In particular, the system addresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor Angel Sappa; Jordi Vitria  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 1868-4394 ISBN 978-3-642-35931-6 Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ VRM2013 Serial 2341  
Permanent link to this record
 

 
Author Antonio Clavelli; Dimosthenis Karatzas; Josep Llados; Mario Ferraro; Giuseppe Boccignone edit   pdf
doi  openurl
  Title Modelling task-dependent eye guidance to objects in pictures Type Journal Article
  Year 2014 Publication Cognitive Computation Abbreviated Journal CoCom  
  Volume 6 Issue 3 Pages 558-584  
  Keywords Visual attention; Gaze guidance; Value; Payoff; Stochastic fixation prediction  
  Abstract 5Y Impact Factor: 1.14 / 3rd (Computer Science, Artificial Intelligence)
We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identifies sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments.
 
  Address  
  Corporate Author Thesis  
  Publisher Springer US Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 1866-9956 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.056; 600.045; 605.203; 601.212; 600.077 Approved no  
  Call Number Admin @ si @ CKL2014 Serial 2419  
Permanent link to this record
 

 
Author Jon Almazan; Albert Gordo; Alicia Fornes; Ernest Valveny edit   pdf
doi  openurl
  Title Handwritten Word Spotting with Corrected Attributes Type Conference Article
  Year 2013 Publication 15th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 1017-1024  
  Keywords  
  Abstract We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.  
  Address Sydney; Australia; December 2013  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 1550-5499 ISBN Medium  
  Area Expedition Conference ICCV  
  Notes DAG Approved no  
  Call Number Admin @ si @ AGF2013 Serial 2327  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: