|
Mohamed Ali Souibgui, Y.Kessentini and Alicia Fornes. 2020. A conditional GAN based approach for distorted camera captured documents recovery. 4th Mediterranean Conference on Pattern Recognition and Artificial Intelligence.
|
|
|
Mohamed Ali Souibgui, Alicia Fornes, Y.Kessentini and C.Tudor. 2021. A Few-shot Learning Approach for Historical Encoded Manuscript Recognition. 25th International Conference on Pattern Recognition.5413–5420.
Abstract: Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.
|
|
|
Arnau Baro, Alicia Fornes and Carles Badal. 2020. Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism. 17th International Conference on Frontiers in Handwriting Recognition.
Abstract: Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.
|
|
|
Jialuo Chen, M.A.Souibgui, Alicia Fornes and Beata Megyesi. 2020. A Web-based Interactive Transcription Tool for Encrypted Manuscripts. 3rd International Conference on Historical Cryptology.52–59.
Abstract: Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with
the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available.
|
|
|
Lei Kang, Marçal Rusiñol, Alicia Fornes, Pau Riba and Mauricio Villegas. 2020. Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition. IEEE Winter Conference on Applications of Computer Vision.
Abstract: Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.
|
|
|
Lei Kang, Pau Riba, Yaxing Wang, Marçal Rusiñol, Alicia Fornes and Mauricio Villegas. 2020. GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images. 16th European Conference on Computer Vision.
Abstract: Although current image generation methods have reached impressive quality levels, they are still unable to produce plausible yet diverse images of handwritten words. On the contrary, when writing by hand, a great variability is observed across different writers, and even when analyzing words scribbled by the same individual, involuntary variations are conspicuous. In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words. We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content. Our generator is guided by three complementary learning objectives: to produce realistic images, to imitate a certain handwriting style and to convey a specific textual content. Our model is unconstrained to any predefined vocabulary, being able to render whatever input word. Given a sample writer, it is also able to mimic its calligraphic features in a few-shot setup. We significantly advance over prior art and demonstrate with qualitative, quantitative and human-based evaluations the realistic aspect of our synthetically produced images.
|
|
|
Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornes and Mauricio Villegas. 2020. Distilling Content from Style for Handwritten Word Recognition. 17th International Conference on Frontiers in Handwriting Recognition.
Abstract: Despite the latest transcription accuracies reached using deep neural network architectures, handwritten text recognition still remains a challenging problem, mainly because of the large inter-writer style variability. Both augmenting the training set with artificial samples using synthetic fonts, and writer adaptation techniques have been proposed to yield more generic approaches aimed at dodging style unevenness. In this work, we take a step closer to learn style independent features from handwritten word images. We propose a novel method that is able to disentangle the content and style aspects of input images by jointly optimizing a generative process and a handwritten
word recognizer. The generator is aimed at transferring writing style features from one sample to another in an image-to-image translation approach, thus leading to a learned content-centric features that shall be independent to writing style attributes.
Our proposed recognition model is able then to leverage such writer-agnostic features to reach better recognition performances. We advance over prior training strategies and demonstrate with qualitative and quantitative evaluations the performance of both
the generative process and the recognition efficiency in the IAM dataset.
|
|
|
Raul Gomez, Jaume Gibert, Lluis Gomez and Dimosthenis Karatzas. 2020. Location Sensitive Image Retrieval and Tagging. 16th European Conference on Computer Vision.
Abstract: People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
|
|
|
Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas and C.V. Jawahar. 2020. RoadText-1K: Text Detection and Recognition Dataset for Driving Videos. IEEE International Conference on Robotics and Automation.
Abstract: Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/
projects/cvit-projects/roadtext-1k
|
|
|
Albert Berenguel. 2019. Analysis of background textures in banknotes and identity documents for counterfeit detection. (Ph.D. thesis, Ediciones Graficas Rey.)
Abstract: Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
|
|