PT Unknown AU Albert Gordo Ernest Valveny TI The diagonal split: A pre-segmentation step for page layout analysis & classification BT 4th Iberian Conference on Pattern Recognition and Image Analysis PY 2009 BP 290–297 VL 5524 DI 10.1007/978-3-642-02172-5_38 AB Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives. ER