PT Unknown AU David Aldavert Marçal Rusiñol TI Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting BT 13th IAPR International Workshop on Document Analysis Systems PY 2018 BP 223 EP 228 DI 10.1109/DAS.2018.25 DE Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information AB Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable forlarge document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, inthis paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semanticinformation in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation. ER