|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2011. Document Seal Detection Using Ght and Character Proximity Graphs. PR, 44(6), 1282–1295.
Abstract: This paper deals with automatic detection of seal (stamp) from documents with cluttered background. Seal detection involves a difficult challenge due to its multi-oriented nature, arbitrary shape, overlapping of its part with signature, noise, etc. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors computed from recognition result of individual connected components (characters). Scale and rotation invariant features are used in a Support Vector Machine (SVM) classifier to recognize multi-scale and multi-oriented text characters. The concept of generalized Hough transform (GHT) is used to detect the seal and a voting scheme is designed for finding possible location of the seal in a document based on the spatial feature descriptor of neighboring component pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal in a document. Experiment is performed in an archive of historical documents of handwritten/printed English text. Experimental results show that the method is robust in locating seal instances of arbitrary shape and orientation in documents, and also efficient in indexing a collection of documents for retrieval purposes.
Keywords: Seal recognition; Graphical symbol spotting; Generalized Hough transform; Multi-oriented character recognition
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2010. Touching Text Character Localization in Graphical Documents using SIFT. Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers. Springer Berlin Heidelberg, 199–211. (LNCS.)
Abstract: Interpretation of graphical document images is a challenging task as it requires proper understanding of text/graphics symbols present in such documents. Difficulties arise in graphical document recognition when text and symbol overlapped/touched. Intersection of text and symbols with graphical lines and curves occur frequently in graphical documents and hence separation of such symbols is very difficult.
Several pattern recognition and classification techniques exist to recognize isolated text/symbol. But, the touching/overlapping text and symbol recognition has not yet been dealt successfully. An interesting technique, Scale Invariant Feature Transform (SIFT), originally devised for object recognition can take care of overlapping problems. Even if SIFT features have emerged as a very powerful object descriptors, their employment in graphical documents context has not been investigated much. In this paper we present the adaptation of the SIFT approach in the context of text character localization (spotting) in graphical documents. We evaluate the applicability of this technique in such documents and discuss the scope of improvement by combining some state-of-the-art approaches.
Keywords: Support Vector Machine; Text Component; Graphical Line; Document Image; Scale Invariant Feature Transform
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2014. Classification of Administrative Document Images by Logo Identification. In Bart Lamiroy and Jean-Marc Ogier, eds. Graphics Recognition. Current Trends and Challenges. Springer Berlin Heidelberg, 49–58.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier’s graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
Keywords: Administrative Document Classification; Logo Recognition; Logo Spotting
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2011. Classification of Administrative Document Images by Logo Identification. In proceedings of 9th IAPR Workshop on Graphic Recognition.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
|
|
|
Pau Riba, Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2021. Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting. 16th International Conference on Document Analysis and Recognition.381–395.
Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
|
|
|
Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas and C.V. Jawahar. 2020. RoadText-1K: Text Detection and Recognition Dataset for Driving Videos. IEEE International Conference on Robotics and Automation.
Abstract: Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/
projects/cvit-projects/roadtext-1k
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados and Felipe Lumbreras. 2016. Information Theoretic Rotationwise Robust Binary Descriptor Learning. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR).368–378.
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Youssef El Rhabi, Simon Loic and Brun Luc. 2015. Estimation de la pose d’une caméra à partir d’un flux vidéo en s’approchant du temps réel. 15ème édition d'ORASIS, journées francophones des jeunes chercheurs en vision par ordinateur ORASIS2015.
Abstract: Finding a way to estimate quickly and robustly the pose of an image is essential in augmented reality. Here we will discuss the approach we chose in order to get closer to real time by using SIFT points [4]. We propose a method based on filtering both SIFT points and images on which to focus on. Hence we will focus on relevant data.
Keywords: Augmented Reality; SFM; SLAM; real time pose computation; 2D/3D registration
|
|
|
Pau Riba, Josep Llados and Alicia Fornes. 2020. Hierarchical graphs for coarse-to-fine error tolerant matching. PRL, 134, 116–124.
Abstract: During the last years, graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their ability to capture both structural and appearance-based information. Thus, they provide a greater representational power than classical statistical frameworks. However, graph-based representations leads to high computational complexities usually dealt by graph embeddings or approximated matching techniques. Despite their representational power, they are very sensitive to noise and small variations of the input image. With the aim to cope with the time complexity and the variability present in the generated graphs, in this paper we propose to construct a novel hierarchical graph representation. Graph clustering techniques adapted from social media analysis have been used in order to contract a graph at different abstraction levels while keeping information about the topology. Abstract nodes attributes summarise information about the contracted graph partition. For the proposed representations, a coarse-to-fine matching technique is defined. Hence, small graphs are used as a filtering before more accurate matching methods are applied. This approach has been validated in real scenarios such as classification of colour images or retrieval of handwritten words (i.e. word spotting).
Keywords: Hierarchical graph representation; Coarse-to-fine graph matching; Graph-based retrieval
|
|
|
Pau Riba, Josep Llados and Alicia Fornes. 2017. Error-tolerant coarse-to-fine matching model for hierarchical graphs. In Pasquale Foggia, Cheng-Lin Liu and Mario Vento, eds. 11th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition. Springer International Publishing, 107–117.
Abstract: Graph-based representations are effective tools to capture structural information from visual elements. However, retrieving a query graph from a large database of graphs implies a high computational complexity. Moreover, these representations are very sensitive to noise or small changes. In this work, a novel hierarchical graph representation is designed. Using graph clustering techniques adapted from graph-based social media analysis, we propose to generate a hierarchy able to deal with different levels of abstraction while keeping information about the topology. For the proposed representations, a coarse-to-fine matching method is defined. These approaches are validated using real scenarios such as classification of colour images and handwritten word spotting.
Keywords: Graph matching; Hierarchical graph; Graph-based representation; Coarse-to-fine matching
|
|