|
Mathieu Nicolas Delalandre, Ernest Valveny and Josep Llados. 2008. Performance Evaluation of Symbol Recognition and Spotting Systems: An Overview.
|
|
|
Mathieu Nicolas Delalandre, Ernest Valveny and Josep Llados. 2008. Performance Evaluation of Symbol Recognition and Spotting Systems. Proceedings of the 8th International Workshop on Document Analysis Systems,.497–505.
|
|
|
Masakazu Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, Lluis Gomez and Dimosthenis Karatzas. 2017. ICDAR2017 Robust Reading Challenge on Omnidirectional Video. 14th International Conference on Document Analysis and Recognition.
Abstract: Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community.
|
|
|
Marwa Dhiaf and 6 others. 2023. CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition.
Abstract: Self-supervised learning has recently emerged as a strong alternative in document analysis. These approaches are now capable of learning high-quality image representations and overcoming the limitations of supervised methods, which require a large amount of labeled data. However, these methods are unable to capture new knowledge in an incremental fashion, where data is presented to the model sequentially, which is closer to the realistic scenario. In this paper, we explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition, as an example of sequence recognition. Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task. Our proposed framework is efficient in both computation and memory complexity. To demonstrate its effectiveness, we evaluate our method by transferring the learned model to diverse text recognition downstream tasks, including Latin and non-Latin scripts. As far as we know, this is the first application of continual self-supervised learning for handwritten text recognition. We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task. The code and trained models will be publicly available.
|
|
|
Maria Vanrell, Felipe Lumbreras, A. Pujol, Ramon Baldrich, Josep Llados and Juan J. Villanueva. 2001. Colour Normalisation Based on Background Information..
|
|
|
Marc Sunset Perez, Marc Comino Trinidad, Dimosthenis Karatzas, Antonio Chica Calaf and Pere Pau Vazquez Alcocer. 2016. Development of general‐purpose projection‐based augmented reality systems.
Abstract: Despite the large amount of methods and applications of augmented reality, there is little homogenizatio n on the software platforms that support them. An exception may be the low level control software that is provided by some high profile vendors such as Qualcomm and Metaio. However, these provide fine grain modules for e.g. element tracking. We are more co ncerned on the application framework, that includes the control of the devices working together for the development of the AR experience. In this paper we describe the development of a software framework for AR setups. We concentrate on the modular design of the framework, but also on some hard problems such as the calibration stage, crucial for projection – based AR. The developed framework is suitable and has been tested in AR applications using camera – projector pairs, for both fixed and nomadic setups
|
|
|
Marçal Rusiñol, Volkmar Frinken, Dimosthenis Karatzas, Andrew Bagdanov and Josep Llados. 2014. Multimodal page classification in administrative document image streams. IJDAR, 17(4), 331–341.
Abstract: In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
Keywords: Digital mail room; Multimodal page classification; Visual and textual document description
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2011. Classification of Administrative Document Images by Logo Identification. In proceedings of 9th IAPR Workshop on Graphic Recognition.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2013. Classification of Administrative Document Images by Logo Identification. 10th IAPR International Workshop on Graphics Recognition.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2014. Classification of Administrative Document Images by Logo Identification. In Bart Lamiroy and Jean-Marc Ogier, eds. Graphics Recognition. Current Trends and Challenges. Springer Berlin Heidelberg, 49–58.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier’s graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
Keywords: Administrative Document Classification; Logo Recognition; Logo Spotting
|
|