Oriol Ramos Terrades, N. Serrano, Albert Gordo, Ernest Valveny, & Alfons Juan-Ciscar. (2010). Interactive-predictive detection of handwritten text blocks. In 17th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium (Vol. 7534, 75340Q–75340Q–10).
Abstract: A method for text block detection is introduced for old handwritten documents. The proposed method takes advantage of sequential book structure, taking into account layout information from pages previously transcribed. This glance at the past is used to predict the position of text blocks in the current page with the help of conventional layout analysis methods. The method is integrated into the GIDOC prototype: a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. Results are given in a transcription task on a 764-page Spanish manuscript from 1891.
|
Albert Ali Salah, E. Pauwels, R. Tavenard, & Theo Gevers. (2010). T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data. SENS - Sensors, 10(8), 7496–7513.
Abstract: The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events.
Keywords: sensor networks; temporal pattern extraction; T-patterns; Lempel-Ziv; Gaussian mixture model; MERL motion data
|
Jaume Garcia, Albert Andaluz, Debora Gil, & Francesc Carreras. (2010). Decoupled External Forces in a Predictor-Corrector Segmentation Scheme for LV Contours in Tagged MR Images. In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 4805–4808).
Abstract: Computation of functional regional scores requires proper identification of LV contours. On one hand, manual segmentation is robust, but it is time consuming and requires high expertise. On the other hand, the tag pattern in TMR sequences is a problem for automatic segmentation of LV boundaries. We propose a segmentation method based on a predictorcorrector (Active Contours – Shape Models) scheme. Special stress is put in the definition of the AC external forces. First, we introduce a semantic description of the LV that discriminates myocardial tissue by using texture and motion descriptors. Second, in order to ensure convergence regardless of the initial contour, the external energy is decoupled according to the orientation of the edges in the image potential. We have validated the model in terms of error in segmented contours and accuracy of regional clinical scores.
|
Jaume Amores. (2010). Vocabulary-based Approaches for Multiple-Instance Data: a Comparative Study. In 20th International Conference on Pattern Recognition (4246–4250).
Abstract: Multiple Instance Learning (MIL) has become a hot topic and many different algorithms have been proposed in the last years. Despite this fact, there is a lack of comparative studies that shed light into the characteristics of the different methods and their behavior in different scenarios. In this paper we provide such an analysis. We include methods from different families, and pay special attention to vocabulary-based approaches, a new family of methods that has not received much attention in the MIL literature. The empirical comparison includes seven databases from four heterogeneous domains, implementations of eight popular MIL methods, and a study of the behavior under synthetic conditions. Based on this analysis, we show that, with an appropriate implementation, vocabulary-based approaches outperform other MIL methods in most of the cases, showing in general a more consistent performance.
|
Joan Mas, Josep Llados, Gemma Sanchez, & J.A. Jorge. (2010). A syntactic approach based on distortion-tolerant Adjacency Grammars and a spatial-directed parser to interpret sketched diagrams. PR - Pattern Recognition, 43(12), 4148–4164.
Abstract: This paper presents a syntactic approach based on Adjacency Grammars (AG) for sketch diagram modeling and understanding. Diagrams are a combination of graphical symbols arranged according to a set of spatial rules defined by a visual language. AG describe visual shapes by productions defined in terms of terminal and non-terminal symbols (graphical primitives and subshapes), and a set functions describing the spatial arrangements between symbols. Our approach to sketch diagram understanding provides three main contributions. First, since AG are linear grammars, there is a need to define shapes and relations inherently bidimensional using a sequential formalism. Second, our parsing approach uses an indexing structure based on a spatial tessellation. This serves to reduce the search space when finding candidates to produce a valid reduction. This allows order-free parsing of 2D visual sentences while keeping combinatorial explosion in check. Third, working with sketches requires a distortion model to cope with the natural variations of hand drawn strokes. To this end we extended the basic grammar with a distortion measure modeled on the allowable variation on spatial constraints associated with grammar productions. Finally, the paper reports on an experimental framework an interactive system for sketch analysis. User tests performed on two real scenarios show that our approach is usable in interactive settings.
Keywords: Syntactic Pattern Recognition; Symbol recognition; Diagram understanding; Sketched diagrams; Adjacency Grammars; Incremental parsing; Spatial directed parsing
|
Umapada Pal, Partha Pratim Roy, N. Tripathya, & Josep Llados. (2010). Multi-oriented Bangla and Devnagari text recognition. PR - Pattern Recognition, 43(12), 4124–4136.
Abstract: There are printed complex documents where text lines of a single page may have different orientations or the text lines may be curved in shape. As a result, it is difficult to detect the skew of such documents and hence character segmentation and recognition of such documents are a complex task. In this paper, using background and foreground information we propose a novel scheme towards the recognition of Indian complex documents of Bangla and Devnagari script. In Bangla and Devnagari documents usually characters in a word touch and they form cavity regions. To take care of these cavity regions, background information of such documents is used. Convex hull and water reservoir principle have been applied for this purpose. Here, at first, the characters are segmented from the documents using the background information of the text. Next, individual characters are recognized using rotation invariant features obtained from the foreground part of the characters.
For character segmentation, at first, writing mode of a touching component (word) is detected using water reservoir principle based features. Next, depending on writing mode and the reservoir base-region of the touching component, a set of candidate envelope points is then selected from the contour points of the component. Based on these candidate points, the touching component is finally segmented into individual characters. For recognition of multi-sized/multi-oriented characters the features are computed from different angular information obtained from the external and internal contour pixels of the characters. These angular information are computed in such a way that they do not depend on the size and rotation of the characters. Circular and convex hull rings have been used to divide a character into smaller zones to get zone-wise features for higher recognition results. We combine circular and convex hull features to improve the results and these features are fed to support vector machines (SVM) for recognition. From our experiment we obtained recognition results of 99.18% (98.86%) accuracy when tested on 7515 (7874) Devnagari (Bangla) characters.
|
Simone Balocco, O. Basset, G. Courbebaisse, E. Boni, Alejandro F. Frangi, P. Tortoli, et al. (2010). Estimation Of Viscoelastic Properties Of Vessel Walls Using a Computational Model and Doppler Ultrasound. PMB - Physics in Medicine and Biology, 55(12), 3557–3575.
Abstract: Human arteries affected by atherosclerosis are characterized by altered wall viscoelastic properties. The possibility of noninvasively assessing arterial viscoelasticity in vivo would significantly contribute to the early diagnosis and prevention of this disease. This paper presents a noniterative technique to estimate the viscoelastic parameters of a vascular wall Zener model. The approach requires the simultaneous measurement of flow variations and wall displacements, which can be provided by suitable ultrasound Doppler instruments. Viscoelastic parameters are estimated by fitting the theoretical constitutive equations to the experimental measurements using an ARMA parameter approach. The accuracy and sensitivity of the proposed method are tested using reference data generated by numerical simulations of arterial pulsation in which the physiological conditions and the viscoelastic parameters of the model can be suitably varied. The estimated values quantitatively agree with the reference values, showing that the only parameter affected by changing the physiological conditions is viscosity, whose relative error was about 27% even when a poor signal-to-noise ratio is simulated. Finally, the feasibility of the method is illustrated through three measurements made at different flow regimes on a cylindrical vessel phantom, yielding a parameter mean estimation error of 25%.
|
Fadi Dornaika, & Bogdan Raducanu. (2010). Person-specific face shape estimation under varying head pose from single snapshots. In 20th International Conference on Pattern Recognition (3496–3499).
Abstract: This paper presents a new method for person-specific face shape estimation under varying head pose of a previously unseen person from a single image. We describe a featureless approach based on a deformable 3D model and a learned face subspace. The proposed approach is based on maximizing a likelihood measure associated with a learned face subspace, which is carried out by a stochastic and genetic optimizer. We conducted the experiments on a subset of Honda Video Database showing the feasibility and robustness of the proposed approach. For this reason, our approach could lend itself nicely to complex frameworks involving 3D face tracking and face gesture recognition in monocular videos.
|
Muhammad Muzzamil Luqman, Thierry Brouard, Jean-Yves Ramel, & Josep Llados. (2010). A Content Spotting System For Line Drawing Graphic Document Images. In 20th International Conference on Pattern Recognition (Vol. 20, 3420–3423).
Abstract: We present a content spotting system for line drawing graphic document images. The proposed system is sufficiently domain independent and takes the keyword based information retrieval for graphic documents, one step forward, to Query By Example (QBE) and focused retrieval. During offline learning mode: we vectorize the documents in the repository, represent them by attributed relational graphs, extract regions of interest (ROIs) from them, convert each ROI to a fuzzy structural signature, cluster similar signatures to form ROI classes and build an index for the repository. During online querying mode: a Bayesian network classifier recognizes the ROIs in the query image and the corresponding documents are fetched by looking up in the repository index. Experimental results are presented for synthetic images of architectural and electronic documents.
|
Josep M. Gonfaus, Xavier Boix, Joost Van de Weijer, Andrew Bagdanov, Joan Serrat, & Jordi Gonzalez. (2010). Harmony Potentials for Joint Classification and Segmentation. In 23rd IEEE Conference on Computer Vision and Pattern Recognition (3280–3287).
Abstract: Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales. To address this problem, we propose a new potential, called harmony potential, which can encode any possible combination of class labels. We propose an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21.
|
Mohammad Rouhani, & Angel Sappa. (2010). Relaxing the 3L Algorithm for an Accurate Implicit Polynomial Fitting. In 23rd IEEE Conference on Computer Vision and Pattern Recognition (pp. 3066–3072).
Abstract: This paper presents a novel method to increase the accuracy of linear fitting of implicit polynomials. The proposed method is based on the 3L algorithm philosophy. The novelty lies on the relaxation of the additional constraints, already imposed by the 3L algorithm. Hence, the accuracy of the final solution is increased due to the proper adjustment of the expected values in the aforementioned additional constraints. Although iterative, the proposed approach solves the fitting problem within a linear framework, which is independent of the threshold tuning. Experimental results, both in 2D and 3D, showing improvements in the accuracy of the fitting are presented. Comparisons with both state of the art algorithms and a geometric based one (non-linear fitting), which is used as a ground truth, are provided.
|
Fernando Barrera, Felipe Lumbreras, & Angel Sappa. (2010). Multimodal Template Matching based on Gradient and Mutual Information using Scale-Space. In 17th IEEE International Conference on Image Processing (2749–2752).
Abstract: This paper presents the combined use of gradient and mutual information for infrared and intensity templates matching. We propose to joint: (i) feature matching in a multiresolution context and (ii) information propagation through scale-space representations. Our method consists in combining mutual information with a shape descriptor based on gradient, and propagate them following a coarse-to-fine strategy. The main contributions of this work are: to offer a theoretical formulation towards a multimodal stereo matching; to show that gradient and mutual information can be reinforced while they are propagated between consecutive levels; and to show that they are valid cost functions in multimodal template matchings. Comparisons are presented showing the improvements and viability of the proposed approach.
|
Mario Rojas, David Masip, A. Todorov, & Jordi Vitria. (2010). Automatic Point-based Facial Trait Judgments Evaluation. In 23rd IEEE Conference on Computer Vision and Pattern Recognition (2715–2720).
Abstract: Humans constantly evaluate the personalities of other people using their faces. Facial trait judgments have been studied in the psychological field, and have been determined to influence important social outcomes of our lives, such as elections outcomes and social relationships. Recent work on textual descriptions of faces has shown that trait judgments are highly correlated. Further, behavioral studies suggest that two orthogonal dimensions, valence and dominance, can describe the basis of the human judgments from faces. In this paper, we used a corpus of behavioral data of judgments on different trait dimensions to automatically learn a trait predictor from facial pixel images. We study whether trait evaluations performed by humans can be learned using machine learning classifiers, and used later in automatic evaluations of new facial images. The experiments performed using local point-based descriptors show promising results in the evaluation of the main traits.
|
Anjan Dutta, Umapada Pal, Alicia Fornes, & Josep Llados. (2010). An Efficient Staff Removal Technique from Printed Musical Documents. In 20th International Conference on Pattern Recognition (1965–1968).
Abstract: Staff removal is an important preprocessing step of the Optical Music Recognition (OMR). The process aims to remove the stafflines from a musical document and retain only the musical symbols, later these symbols are used effectively to identify the music information. This paper proposes a simple but robust method to remove stafflines from printed musical scores. In the proposed methodology we have considered a staffline segment as a horizontal linkage of vertical black runs with uniform height. We have used the neighbouring properties of a staffline segment to validate it as a true segment. We have considered the dataset along with the deformations described in for evaluation purpose. From experimentation we have got encouraging results.
|
Alicia Fornes, Sergio Escalera, Josep Llados, & Ernest Valveny. (2010). Symbol Classification using Dynamic Aligned Shape Descriptor. In 20th International Conference on Pattern Recognition (1957–1960).
Abstract: Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps or noise. In this paper, we propose a new descriptor and distance computation for coping with the problem of symbol recognition in the domain of Graphical Document Image Analysis. The proposed D-Shape descriptor encodes the arrangement information of object parts in a circular structure, allowing different levels of distortion. The classification is performed using a cyclic Dynamic Time Warping based method, allowing distortions and rotation. The methodology has been validated on different data sets, showing very high recognition rates.
|