PT Journal
AU L. Rothacker
   Marçal Rusiñol
   Josep Llados
   G.A. Fink
TI A Two-stage Approach to Segmentation-Free Query-by-example Word Spotting
PY 2014
AB With the ongoing progress in digitization, huge document collections  and  archives  have become available to a broad audience. Scanned document images can be transmitted electronically and studied simultaneously throughout the world. While this is very beneficial, it is often impossible to perform automated searches on these document collections. Optical character recognition usually fails when it comes to handwritten or historic documents. In order to address the need for exploring document collections rapidly, researchers are working on word spotting.  In query-by-example word spotting scenarios, the user selects an exemplary occurrence of the query word in a document image. The word spotting system then  retrieves all regions in the collection that are visually similar to the given example of the query word. The best matching regions are presented to the user and no actual transcription is required. An important property of a word spotting system is the computational speed with which queries can be executed. In our previous work, we presented a relatively slow  but high-precision method. In the present work, we will extend this baseline system to an integrated two-stage approach. In a coarse-grained first stage, we will filter document images efficiently in order to identify regions that are likely to contain the query word. In the fine-grained second stage, these regions will be analyzed with our previously presented high-precision method. Finally, we will report recognition results and query times for the well-known George Washingtonbenchmark in our evaluation. We achieve state-of-the-art recognition results while the query times can be reduced to 50% in comparison with our baseline.
ER