|
Lluis Gomez and Dimosthenis Karatzas. 2014. MSER-based Real-Time Text Detection and Tracking. 22nd International Conference on Pattern Recognition.3110–3115.
Abstract: We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization.
A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.
|
|
|
Sergio Escalera, Alicia Fornes, Oriol Pujol, Josep Llados and Petia Radeva. 2007. Multi-class Binary Object Categorization using Blurred Shape Models. Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamerican Congress on Pattern.773–782. (LCNS.)
|
|
|
Sergio Escalera, Alicia Fornes, Oriol Pujol and Petia Radeva. 2009. Multi-class Binary Symbol Classification with Circular Blurred Shape Models. 15th International Conference on Image Analysis and Processing. Springer Berlin Heidelberg, 1005–1014. (LNCS.)
Abstract: Multi-class binary symbol classification requires the use of rich descriptors and robust classifiers. Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps or noise. In this paper, we present the Circular Blurred Shape Model descriptor. This descriptor encodes the arrangement information of object parts in a correlogram structure. A prior blurring degree defines the level of distortion allowed to the symbol. Moreover, we learn the new feature space using a set of Adaboost classifiers, which are combined in the Error-Correcting Output Codes framework to deal with the multi-class categorization problem. The presented work has been validated over different multi-class data sets, and compared to the state-of-the-art descriptors, showing significant performance improvements.
|
|
|
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez and Dimosthenis Karatzas. 2021. Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. IEEE Winter Conference on Applications of Computer Vision.4022–4032.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados and Mathieu Nicolas Delalandre. 2009. Multi-Oriented and Multi-Sized Touching Character Segmentation using Dynamic Programming. 10th International Conference on Document Analysis and Recognition.11–15.
Abstract: In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region at the background portion. Using Convex Hull information, we use these background information to find some initial points to segment a touching string into possible primitive segments (a primitive segment consists of a single character or a part of a character). Next these primitive segments are merged to get optimum segmentation and dynamic programming is applied using total likelihood of characters as the objective function. SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Circular ring and convex hull ring based approach has been used along with angular information of the contour pixels of the character to make the feature rotation invariant. From the experiment, we obtained encouraging results.
|
|
|
Umapada Pal, Partha Pratim Roy, N. Tripathya and Josep Llados. 2010. Multi-oriented Bangla and Devnagari text recognition. PR, 43(12), 4124–4136.
Abstract: There are printed complex documents where text lines of a single page may have different orientations or the text lines may be curved in shape. As a result, it is difficult to detect the skew of such documents and hence character segmentation and recognition of such documents are a complex task. In this paper, using background and foreground information we propose a novel scheme towards the recognition of Indian complex documents of Bangla and Devnagari script. In Bangla and Devnagari documents usually characters in a word touch and they form cavity regions. To take care of these cavity regions, background information of such documents is used. Convex hull and water reservoir principle have been applied for this purpose. Here, at first, the characters are segmented from the documents using the background information of the text. Next, individual characters are recognized using rotation invariant features obtained from the foreground part of the characters.
For character segmentation, at first, writing mode of a touching component (word) is detected using water reservoir principle based features. Next, depending on writing mode and the reservoir base-region of the touching component, a set of candidate envelope points is then selected from the contour points of the component. Based on these candidate points, the touching component is finally segmented into individual characters. For recognition of multi-sized/multi-oriented characters the features are computed from different angular information obtained from the external and internal contour pixels of the characters. These angular information are computed in such a way that they do not depend on the size and rotation of the characters. Circular and convex hull rings have been used to divide a character into smaller zones to get zone-wise features for higher recognition results. We combine circular and convex hull features to improve the results and these features are fed to support vector machines (SVM) for recognition. From our experiment we obtained recognition results of 99.18% (98.86%) accuracy when tested on 7515 (7874) Devnagari (Bangla) characters.
|
|
|
Partha Pratim Roy and Josep Llados. 2008. Multi-Oriented Character Recognition from Graphical Documents. 2nd International Conference on Cognition and Recognition.30–35.
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2008. Multi-oriented English Text Line Extraction using Background and Foreground Information. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.315–322.
|
|
|
Palaiahnakote Shivakumara, Anjan Dutta, Chew Lim Tan and Umapada Pal. 2014. Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTAP, 72(1), 515–539.
Abstract: In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados and Mathieu Nicolas Delalandre. 2012. Multi-oriented touching text character segmentation in graphical documents using dynamic programming. PR, 45(5), 1972–1983.
Abstract: 2,292 JCR
The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes.
|
|