PT Journal AU Albert Gordo Florent Perronnin Ernest Valveny TI Large-scale document image retrieval and classification with runlength histograms and binary embeddings SO Pattern Recognition JI PR PY 2013 BP 1898 EP 1905 VL 46 IS 7 DI 10.1016/j.patcog.2012.12.004 DE visual document descriptor; compression; large-scale; retrieval; classification AB We present a new document image descriptor based on multi-scale runlengthhistograms. This descriptor does not rely on layout analysis and can becomputed efficiently. We show how this descriptor can achieve state-of-theartresults on two very different public datasets in classification and retrievaltasks. Moreover, we show how we can compress and binarize these descriptorsto make them suitable for large-scale applications. We can achieve state-ofthe-art results in classification using binary descriptors of as few as 16 to 64bits. ER