toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author David Aldavert; Marçal Rusiñol; Ricardo Toledo; Josep Llados edit   pdf
doi  openurl
  Title Integrating Visual and Textual Cues for Query-by-String Word Spotting Type Conference Article
  Year 2013 Publication 12th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 511 - 515  
  Keywords  
  Abstract (down) In this paper, we present a word spotting framework that follows the query-by-string paradigm where word images are represented both by textual and visual representations. The textual representation is formulated in terms of character $n$-grams while the visual one is based on the bag-of-visual-words scheme. These two representations are merged together and projected to a sub-vector space. This transform allows to, given a textual query, retrieve word instances that were only represented by the visual modality. Moreover, this statistical representation can be used together with state-of-the-art indexation structures in order to deal with large-scale scenarios. The proposed method is evaluated using a collection of historical documents outperforming state-of-the-art performances.  
  Address Washington; USA; August 2013  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1520-5363 ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; ADAS; 600.045; 600.055; 600.061 Approved no  
  Call Number Admin @ si @ ART2013 Serial 2224  
Permanent link to this record
 

 
Author Thanh Ha Do; Oriol Ramos Terrades; Salvatore Tabbone edit  url
openurl 
  Title DSD: document sparse-based denoising algorithm Type Journal Article
  Year 2019 Publication Pattern Analysis and Applications Abbreviated Journal PAA  
  Volume 22 Issue 1 Pages 177–186  
  Keywords Document denoising; Sparse representations; Sparse dictionary learning; Document degradation models  
  Abstract (down) In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.097; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ DRT2019 Serial 3254  
Permanent link to this record
 

 
Author Marçal Rusiñol; David Aldavert; Ricardo Toledo; Josep Llados edit  url
doi  openurl
  Title Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method Type Conference Article
  Year 2011 Publication 11th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 63-67  
  Keywords  
  Abstract (down) In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.  
  Address Beijing, China  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG;ADAS Approved no  
  Call Number Admin @ si @ RAT2011 Serial 1788  
Permanent link to this record
 

 
Author Partha Pratim Roy; Umapada Pal; Josep Llados; Mathieu Nicolas Delalandre edit  doi
isbn  openurl
  Title Multi-Oriented and Multi-Sized Touching Character Segmentation using Dynamic Programming Type Conference Article
  Year 2009 Publication 10th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 11–15  
  Keywords  
  Abstract (down) In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region at the background portion. Using Convex Hull information, we use these background information to find some initial points to segment a touching string into possible primitive segments (a primitive segment consists of a single character or a part of a character). Next these primitive segments are merged to get optimum segmentation and dynamic programming is applied using total likelihood of characters as the objective function. SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Circular ring and convex hull ring based approach has been used along with angular information of the contour pixels of the character to make the feature rotation invariant. From the experiment, we obtained encouraging results.  
  Address Barcelona, Spain  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1520-5363 ISBN 978-1-4244-4500-4 Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number DAG @ dag @ RPL2009a Serial 1240  
Permanent link to this record
 

 
Author Marçal Rusiñol; Volkmar Frinken; Dimosthenis Karatzas; Andrew Bagdanov; Josep Llados edit  doi
openurl 
  Title Multimodal page classification in administrative document image streams Type Journal Article
  Year 2014 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 17 Issue 4 Pages 331-341  
  Keywords Digital mail room; Multimodal page classification; Visual and textual document description  
  Abstract (down) In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-2833 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; LAMP; 600.056; 600.061; 601.240; 601.223; 600.077; 600.079 Approved no  
  Call Number Admin @ si @ RFK2014 Serial 2523  
Permanent link to this record
 

 
Author Marçal Rusiñol; David Aldavert; Ricardo Toledo; Josep Llados edit   pdf
doi  openurl
  Title Towards Query-by-Speech Handwritten Keyword Spotting Type Conference Article
  Year 2015 Publication 13th International Conference on Document Analysis and Recognition ICDAR2015 Abbreviated Journal  
  Volume Issue Pages 501-505  
  Keywords  
  Abstract (down) In this paper, we present a new querying paradigm for handwritten keyword spotting. We propose to represent handwritten word images both by visual and audio representations, enabling a query-by-speech keyword spotting system. The two representations are merged together and projected to a common sub-space in the training phase. This transform allows to, given a spoken query, retrieve word instances that were only represented by the visual modality. In addition, the same method can be used backwards at no additional cost to produce a handwritten text-tospeech system. We present our first results on this new querying mechanism using synthetic voices over the George Washington
dataset.
 
  Address Nancy; France; August 2015  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.084; 600.061; 601.223; 600.077;ADAS Approved no  
  Call Number Admin @ si @ RAT2015b Serial 2682  
Permanent link to this record
 

 
Author Emanuele Vivoli; Ali Furkan Biten; Andres Mafla; Dimosthenis Karatzas; Lluis Gomez edit   pdf
url  doi
openurl 
  Title MUST-VQA: MUltilingual Scene-text VQA Type Conference Article
  Year 2022 Publication Proceedings European Conference on Computer Vision Workshops Abbreviated Journal  
  Volume 13804 Issue Pages 345–358  
  Keywords Visual question answering; Scene text; Translation robustness; Multilingual models; Zero-shot transfer; Power of language models  
  Abstract (down) In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks.  
  Address Tel-Aviv; Israel; October 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes DAG; 302.105; 600.155; 611.002 Approved no  
  Call Number Admin @ si @ VBM2022 Serial 3770  
Permanent link to this record
 

 
Author Sounak Dey; Pau Riba; Anjan Dutta; Josep Llados; Yi-Zhe Song edit   pdf
url  doi
openurl 
  Title Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval Type Conference Article
  Year 2019 Publication IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 2179-2188  
  Keywords  
  Abstract (down) In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research.  
  Address Long beach; CA; USA; June 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes DAG; 600.140; 600.121; 600.097 Approved no  
  Call Number Admin @ si @ DRD2019 Serial 3462  
Permanent link to this record
 

 
Author David Aldavert; Marçal Rusiñol edit   pdf
doi  openurl
  Title Manuscript text line detection and segmentation using second-order derivatives analysis Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 293 - 298  
  Keywords text line detection; text line segmentation; text region detection; second-order derivatives  
  Abstract (down) In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a
bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 302.065; 600.121 Approved no  
  Call Number Admin @ si @ AlR2018a Serial 3104  
Permanent link to this record
 

 
Author Pau Riba; Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados edit   pdf
doi  openurl
  Title Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 12822 Issue Pages 381–395  
  Keywords  
  Abstract (down) In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.  
  Address Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.121; 600.140; 110.312 Approved no  
  Call Number Admin @ si @ RMG2021 Serial 3572  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: