|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance. In 22nd International Conference on Pattern Recognition (pp. 3074–3079).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task, especially historical ones. In the paper, we propose a coarse-to-fine handwritten word spotting approach based on graph representation. The presented model comprises both the topological and morphological signatures of the handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. Aiming at developing a practical and efficient word spotting approach for large-scale historical handwritten documents, a fast and coarse comparison is first applied to prune the regions that are not similar to the query based on the graph embedding methodology. Afterwards, the query and regions of interest are compared by graph edit distance based on the Dynamic Time Warping alignment. The proposed approach is evaluated on a public dataset containing 50 pages of historical marriage license records. The results show that the proposed approach achieves a compromise between efficiency and accuracy.
Keywords: word spotting; coarse-to-fine mechamism; graphbased representation; graph embedding; graph edit distance
|
|
|
Alicia Fornes, Josep Llados, Joan Mas, Joana Maria Pujadas-Mora, & Anna Cabre. (2014). A Bimodal Crowdsourcing Platform for Demographic Historical Manuscripts. In Digital Access to Textual Cultural Heritage Conference (pp. 103–108).
Abstract: In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Novel Learning-free Word Spotting Approach Based on Graph Representation. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 207–211).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.
|
|
|
Claudio Baecchi, Francesco Turchini, Lorenzo Seidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Fisher vectors over random density forest for object recognition. In 22nd International Conference on Pattern Recognition (pp. 4328–4333).
|
|
|
Federico Bartoli, Giuseppe Lisanti, Svebor Karaman, Andrew Bagdanov, & Alberto del Bimbo. (2014). Unsupervised scene adaptation for faster multi- scale pedestrian detection. In 22nd International Conference on Pattern Recognition (pp. 3534–3539).
|
|
|
Antonio Hernandez, Stan Sclaroff, & Sergio Escalera. (2014). Contextual rescoring for Human Pose Estimation. In 25th British Machine Vision Conference.
Abstract: A contextual rescoring method is proposed for improving the detection of body joints of a pictorial structure model for human pose estimation. A set of mid-level parts is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body joint hypotheses. A technique is proposed for the automatic discovery of a compact subset of poselets that covers a set of validation images
while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for body joint detections, given its relationship to detections of other body joints and mid-level parts in the image. This new score complements the unary potential of a discriminatively trained pictorial structure model. Experiments on two benchmarks show performance improvements when considering the proposed mid-level image representation and rescoring approach in comparison with other pictorial structure-based approaches.
|
|
|
Sergio Escalera, Xavier Baro, Jordi Gonzalez, Miguel Angel Bautista, Meysam Madadi, Miguel Reyes, et al. (2014). ChaLearn Looking at People Challenge 2014: Dataset and Results. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 459–473). LNCS.
Abstract: This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively.
Keywords: Human Pose Recovery; Behavior Analysis; Action and in- teractions; Multi-modal gestures; recognition
|
|
|
Francisco Cruz, & Oriol Ramos Terrades. (2014). EM-Based Layout Analysis Method for Structured Documents. In 22nd International Conference on Pattern Recognition (pp. 315–320).
Abstract: In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according
to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
|
|
|
Mohammad Rouhani, E. Boyer, & Angel Sappa. (2014). Non-Rigid Registration meets Surface Reconstruction. In International Conference on 3D Vision (pp. 617–624).
Abstract: Non rigid registration is an important task in computer vision with many applications in shape and motion modeling. A fundamental step of the registration is the data association between the source and the target sets. Such association proves difficult in practice, due to the discrete nature of the information and its corruption by various types of noise, e.g. outliers and missing data. In this paper we investigate the benefit of the implicit representations for the non-rigid registration of 3D point clouds. First, the target points are described with small quadratic patches that are blended through partition of unity weighting. Then, the discrete association between the source and the target can be replaced by a continuous distance field induced by the interface. By combining this distance field with a proper deformation term, the registration energy can be expressed in a linear least square form that is easy and fast to solve. This significantly eases the registration by avoiding direct association between points. Moreover, a hierarchical approach can be easily implemented by employing coarse-to-fine representations. Experimental results are provided for point clouds from multi-view data sets. The qualitative and quantitative comparisons show the outperformance and robustness of our framework. %in presence of noise and outliers.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). Scene Text Recognition: No Country for Old Men? In 1st International Workshop on Robust Reading.
|
|
|
Xavier Perez Sala, Fernando De la Torre, Laura Igual, Sergio Escalera, & Cecilio Angulo. (2014). Subspace Procrustes Analysis. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 654–668). LNCS.
Abstract: Procrustes Analysis (PA) has been a popular technique to align and build 2-D statistical models of shapes. Given a set of 2-D shapes PA is applied to remove rigid transformations. Then, a non-rigid 2-D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling 2-D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the 2-D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several instances of a 3-D object, SPA computes the mean and a 2-D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased 2-D models by uniformly sampling dierent views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more ecient in space and time. Experiments using SPA to learn 2-D models of bodies from motion capture data illustrate the benets of our approach.
|
|
|
E. Bondi, L. Sidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Real-time people counting from depth imagery of crowded environments. In 11th IEEE International Conference on Advanced Video and Signal based Surveillance (pp. 337–342).
Abstract: In this paper we describe a system for automatic people counting in crowded environments. The approach we propose is a counting-by-detection method based on depth imagery. It is designed to be deployed as an autonomous appliance for crowd analysis in video surveillance application scenarios. Our system performs foreground/background segmentation on depth image streams in order to coarsely segment persons, then depth information is used to localize head candidates which are then tracked in time on an automatically estimated ground plane. The system runs in real-time, at a frame-rate of about 20 fps. We collected a dataset of RGB-D sequences representing three typical and challenging surveillance scenarios, including crowds, queuing and groups. An extensive comparative evaluation is given between our system and more complex, Latent SVM-based head localization for person counting applications.
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2014). Spotting Symbol Using Sparsity over Learned Dictionary of Local Descriptors. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 156–160).
Abstract: This paper proposes a new approach to spot symbols into graphical documents using sparse representations. More specifically, a dictionary is learned from a training database of local descriptors defined over the documents. Following their sparse representations, interest points sharing similar properties are used to define interest regions. Using an original adaptation of information retrieval techniques, a vector model for interest regions and for a query symbol is built based on its sparsity in a visual vocabulary where the visual words are columns in the learned dictionary. The matching process is performed comparing the similarity between vector models. Evaluation on SESYD datasets demonstrates that our method is promising.
|
|
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2014). Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 181–185).
Abstract: Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.
|
|
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2014). Normalisation et validation d'images de documents capturées en mobilité. In Colloque International Francophone sur l'Écrit et le Document (pp. 109–124).
Abstract: Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods.
Keywords: mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction
|
|