toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author (up) David Aldavert; Marçal Rusiñol edit   pdf
doi  openurl
  Title Manuscript text line detection and segmentation using second-order derivatives analysis Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 293 - 298  
  Keywords text line detection; text line segmentation; text region detection; second-order derivatives  
  Abstract In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a
bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 302.065; 600.121 Approved no  
  Call Number Admin @ si @ AlR2018a Serial 3104  
Permanent link to this record
 

 
Author (up) David Aldavert; Marçal Rusiñol edit   pdf
doi  openurl
  Title Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 223 - 228  
  Keywords Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information  
  Abstract Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for
large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in
this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic
information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 600.121 Approved no  
  Call Number Admin @ si @ AlR2018b Serial 3105  
Permanent link to this record
 

 
Author (up) David Aldavert; Marçal Rusiñol; Ricardo Toledo edit   pdf
doi  openurl
  Title Automatic Static/Variable Content Separation in Administrative Document Images Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this paper we present an automatic method for separating static and variable content from administrative document images. An alignment approach is able to unsupervisedly build probabilistic templates from a set of examples of the same document kind. Such templates define which is the likelihood of every pixel of being either static or variable content. In the extraction step, the same alignment technique is used to match
an incoming image with the template and to locate the positions where variable fields appear. We validate our approach on the public NIST Structured Tax Forms Dataset.
 
  Address Kyoto; Japan; November 2017  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number Admin @ si @ ART2017 Serial 3001  
Permanent link to this record
 

 
Author (up) David Aldavert; Marçal Rusiñol; Ricardo Toledo; Josep Llados edit   pdf
doi  openurl
  Title Integrating Visual and Textual Cues for Query-by-String Word Spotting Type Conference Article
  Year 2013 Publication 12th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 511 - 515  
  Keywords  
  Abstract In this paper, we present a word spotting framework that follows the query-by-string paradigm where word images are represented both by textual and visual representations. The textual representation is formulated in terms of character $n$-grams while the visual one is based on the bag-of-visual-words scheme. These two representations are merged together and projected to a sub-vector space. This transform allows to, given a textual query, retrieve word instances that were only represented by the visual modality. Moreover, this statistical representation can be used together with state-of-the-art indexation structures in order to deal with large-scale scenarios. The proposed method is evaluated using a collection of historical documents outperforming state-of-the-art performances.  
  Address Washington; USA; August 2013  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1520-5363 ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; ADAS; 600.045; 600.055; 600.061 Approved no  
  Call Number Admin @ si @ ART2013 Serial 2224  
Permanent link to this record
 

 
Author (up) David Aldavert; Marçal Rusiñol; Ricardo Toledo; Josep Llados edit  doi
openurl 
  Title A Study of Bag-of-Visual-Words Representations for Handwritten Keyword Spotting Type Journal Article
  Year 2015 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 18 Issue 3 Pages 223-234  
  Keywords Bag-of-Visual-Words; Keyword spotting; Handwritten documents; Performance evaluation  
  Abstract The Bag-of-Visual-Words (BoVW) framework has gained popularity among the document image analysis community, specifically as a representation of handwritten words for recognition or spotting purposes. Although in the computer vision field the BoVW method has been greatly improved, most of the approaches in the document image analysis domain still rely on the basic implementation of the BoVW method disregarding such latest refinements. In this paper, we present a review of those improvements and its application to the keyword spotting task. We thoroughly evaluate their impact against a baseline system in the well-known George Washington dataset and compare the obtained results against nine state-of-the-art keyword spotting methods. In addition, we also compare both the baseline and improved systems with the methods presented at the Handwritten Keyword Spotting Competition 2014.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-2833 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; ADAS; 600.055; 600.061; 601.223; 600.077; 600.097 Approved no  
  Call Number Admin @ si @ ART2015 Serial 2679  
Permanent link to this record
 

 
Author (up) David Fernandez edit  openurl
  Title Handwritten Word Spotting in Old Manuscript Images using Shape Descriptors Type Report
  Year 2010 Publication CVC Technical Report Abbreviated Journal  
  Volume 161 Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis Master's thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ Fer2010b Serial 1353  
Permanent link to this record
 

 
Author (up) David Fernandez edit  isbn
openurl 
  Title Contextual Word Spotting in Historical Handwritten Documents Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners.
There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent de ciencies: poor physical preservation, di erent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically
formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten
documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the
art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is
automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-7-1 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Fer2014 Serial 2573  
Permanent link to this record
 

 
Author (up) David Fernandez; Jon Almazan; Nuria Cirera; Alicia Fornes; Josep Llados edit   pdf
doi  openurl
  Title BH2M: the Barcelona Historical Handwritten Marriages database Type Conference Article
  Year 2014 Publication 22nd International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 256 - 261  
  Keywords  
  Abstract This paper presents an image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms. The contribution of this paper is twofold. First, it presents a complete ground truth which covers the whole pipeline of handwriting
recognition research, from layout analysis to recognition and understanding. Second, it is the first dataset in the emerging area of genealogical document analysis, where documents are manuscripts pseudo-structured with specific lexicons and the interest is beyond pure transcriptions but context dependent.
 
  Address Creete Island; Grecia; September 2014  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1051-4651 ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; 600.056; 600.061; 602.006; 600.077 Approved no  
  Call Number Admin @ si @ FAC2014 Serial 2461  
Permanent link to this record
 

 
Author (up) David Fernandez; Josep Llados; Alicia Fornes edit  doi
isbn  openurl
  Title Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-Structural Descriptor Organized in a Hash Structure Type Conference Article
  Year 2011 Publication 5th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal  
  Volume 6669 Issue Pages 628-635  
  Keywords  
  Abstract There are lots of historical handwritten documents with information that can be used for several studies and projects. The Document Image Analysis and Recognition community is interested in preserving these documents and extracting all the valuable information from them. Handwritten word-spotting is the pattern classification task which consists in detecting handwriting word images. In this work, we have used a query-by-example formalism: we have matched an input image with one or multiple images from handwritten documents to determine the distance that might indicate a correspondence. We have developed an approach based in characteristic Loci Features stored in a hash structure. Document images of the marriage licences of the Cathedral of Barcelona are used as the benchmarking database.  
  Address Las Palmas de Gran Canaria. Spain  
  Corporate Author Thesis  
  Publisher Place of Publication Editor Jordi Vitria; Joao Miguel Raposo; Mario Hernandez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-642-21256-7 Medium  
  Area Expedition Conference IbPRIA  
  Notes DAG Approved no  
  Call Number Admin @ si @ FLF2011 Serial 1742  
Permanent link to this record
 

 
Author (up) David Fernandez; Josep Llados; Alicia Fornes edit  doi
openurl 
  Title A graph-based approach for segmenting touching lines in historical handwritten documents Type Journal Article
  Year 2014 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 17 Issue 3 Pages 293-312  
  Keywords Text line segmentation; Handwritten documents; Document image processing; Historical document analysis  
  Abstract Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-2833 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.056; 600.061; 602.006; 600.077 Approved no  
  Call Number Admin @ si @ FLF2014 Serial 2459  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: