|
Albert Suso, Pau Riba, Oriol Ramos Terrades, & Josep Llados. (2021). A Self-supervised Inverse Graphics Approach for Sketch Parametrization. In 16th International Conference on Document Analysis and Recognition (Vol. 12916, pp. 28–42). LNCS.
Abstract: The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.
|
|
|
J. Chazalon, Marçal Rusiñol, Jean-Marc Ogier, & Josep Llados. (2015). A Semi-Automatic Groundtruthing Tool for Mobile-Captured Document Segmentation. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 621–625).
Abstract: This paper presents a novel way to generate groundtruth data for the evaluation of mobile document capture systems, focusing on the first stage of the image processing pipeline involved: document object detection and segmentation in lowquality preview frames. We introduce and describe a simple, robust and fast technique based on color markers which enables a semi-automated annotation of page corners. We also detail a technique for marker removal. Methods and tools presented in the paper were successfully used to annotate, in few hours, 24889
frames in 150 video files for the smartDOC competition at ICDAR 2015
|
|
|
Suman Ghosh, & Ernest Valveny. (2015). A Sliding Window Framework for Word Spotting Based on Word Attributes. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 652–661). LNCS. Springer International Publishing.
Abstract: In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
Keywords: Word spotting; Sliding window; Word attributes
|
|
|
Petia Radeva, Joan Serrat, & Enric Marti. (1995). A snake for model-based segmentation. In Proc. Conf. Fifth Int Computer Vision (pp. 816–821).
Abstract: Despite the promising results of numerous applications, the hitherto proposed snake techniques share some common problems: snake attraction by spurious edge points, snake degeneration (shrinking and attening), convergence and stability of the deformation process, snake initialization and local determination of the parameters of elasticity. We argue here that these problems can be solved only when all the snake aspects are considered. The snakes proposed here implement a new potential eld and external force in order to provide a deformation convergence, attraction by both near and far edges as well as snake behaviour selective according to the edge orientation. Furthermore, we conclude that in the case of model-based seg mentation, the internal force should include structural information about the expected snake shape. Experiments using this kind of snakes for segmenting bones in complex hand radiographs show a signicant improvement.
Keywords: snakes; elastic matching; model-based segmenta tion
|
|
|
Arnau Baro, Pau Riba, & Alicia Fornes. (2018). A Starting Point for Handwritten Music Recognition. In 1st International Workshop on Reading Music Systems (pp. 5–6).
Abstract: In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Keywords: Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
|
|
|
Gemma Sanchez, Josep Llados, & Enric Marti. (1997). A string-based method to recognize symbols and structural textures in architectural plans. In 2nd IAPR Workshop on Graphics Recognition (pp. 91–103).
Abstract: This paper deals with the recognition of symbols and struc- tural textures in architectural plans using string matching techniques. A plan is represented by an attributed graph whose nodes represent characteristic points and whose edges represent segments. Symbols and textures can be seen as a set of regions, i.e. closed loops in the graph, with a particular arrangement. The search for a symbol involves a graph matching between the regions of a model graph and the regions of the graph representing the document. Discriminating a texture means a clus- tering of neighbouring regions of this graph. Both procedures involve a similarity measure between graph regions. A string codification is used to represent the sequence of outlining edges of a region. Thus, the simila- rity between two regions is defined in terms of the string edit distance between their boundary strings. The use of string matching allows the recognition method to work also under presence of distortion.
|
|
|
Josep Llados, Gemma Sanchez, & Enric Marti. (1997). A String-Based Method to Recognize Symbols and Structural Textures in Architectural Plans. In Graphics Recognition Algorithms and Systems. GREC 1997. (Vol. 1389, pp. 91–103). LNCS.
|
|
|
Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Victor Ponce, Xavier Baro, et al. (2017). A survey on deep learning based approaches for action and gesture recognition in image sequences. In 12th IEEE International Conference on Automatic Face and Gesture Recognition.
Abstract: The interest in action and gesture recognition has grown considerably in the last years. In this paper, we present a survey on current deep learning methodologies for action and gesture recognition in image sequences. We introduce a taxonomy that summarizes important aspects of deep learning
for approaching both tasks. We review the details of the proposed architectures, fusion strategies, main datasets, and competitions.
We summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, discussing their main features and identify opportunities and challenges for future research.
|
|
|
Alicia Fornes, & Josep Llados. (2010). A Symbol-dependent Writer Identifcation Approach in Old Handwritten Music Scores. In 12th International Conference on Frontiers in Handwriting Recognition (pp. 634–639).
Abstract: Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we introduce a symbol-dependent approach for identifying the writer of old music scores, which is based on two symbol recognition methods. The main idea is to use the Blurred Shape Model descriptor and a DTW-based method for detecting, recognizing and describing the music clefs and notes. The proposed approach has been evaluated in a database of old music scores, achieving very high writer identification rates.
|
|
|
R. Bertrand, P. Gomez-Krämer, Oriol Ramos Terrades, P. Franco, & Jean-Marc Ogier. (2013). A System Based On Intrinsic Features for Fraudulent Document Detection. In 12th International Conference on Document Analysis and Recognition (pp. 106–110).
Abstract: Paper documents still represent a large amount of information supports used nowadays and may contain critical data. Even though official documents are secured with techniques such as printed patterns or artwork, paper documents suffer froma lack of security.
However, the high availability of cheap scanning and printing hardware allows non-experts to easily create fake documents. As the use of a watermarking system added during the document production step is hardly possible, solutions have to be proposed to distinguish a genuine document from a forged one.
In this paper, we present an automatic forgery detection method based on document’s intrinsic features at character level. This method is based on the one hand on outlier character detection in a discriminant feature space and on the other hand on the detection of strictly similar characters. Therefore, a feature set iscomputed for all characters. Then, based on a distance between characters of the same class.
Keywords: paper document; document analysis; fraudulent document; forgery; fake
|
|
|
Enric Marti, Jordi Regincos, Jaime Lopez-Krahe, & Juan J.Villanueva. (1991). A system for interpretation of hand line drawings as three-dimensional scene for CAD input. In Proceedings of the First International Conference on Document Analysis and Recognition (pp. 472–480).
|
|
|
Gemma Sanchez, Ernest Valveny, Josep Llados, Enric Marti, Oriol Ramos Terrades, N.Lozano, et al. (2003). A system for virtual prototyping of architectural projects. In Proceedings of Fifth IAPR International Workshop on Pattern Recognition (pp. 65–74).
|
|
|
Sebastien Mace, Herve Locteau, Ernest Valveny, & Salvatore Tabbone. (2010). A system to detect rooms in architectural floor plan images. In 9th IAPR International Workshop on Document Analysis Systems (167–174).
Abstract: In this article, a system to detect rooms in architectural floor plan images is described. We first present a primitive extraction algorithm for line detection. It is based on an original coupling of classical Hough transform with image vectorization in order to perform robust and efficient line detection. We show how the lines that satisfy some graphical arrangements are combined into walls. We also present the way we detect some door hypothesis thanks to the extraction of arcs. Walls and door hypothesis are then used by our room segmentation strategy; it consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines between regions can also be rough. We take advantage of knowledge associated to architectural floor plans in order to obtain mostly rectangular rooms. Qualitative and quantitative evaluations performed on a corpus of real documents show promising results.
|
|
|
Partha Pratim Roy, Eduard Vazquez, Josep Llados, Ramon Baldrich, & Umapada Pal. (2007). A System to Retrieve Text/Symbols from Color Maps using Connected Component and Skeleton Analysis. In J.M. Ogier W. L. J. Llados (Ed.), Seventh IAPR International Workshop on Graphics Recognition (79–78).
|
|
|
Pau Torras, Mohamed Ali Souibgui, Jialuo Chen, & Alicia Fornes. (2021). A Transcription Is All You Need: Learning to Align through Attention. In 14th IAPR International Workshop on Graphics Recognition (Vol. 12916, 141–146). LNCS.
Abstract: Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset.
|
|