|
Alicia Fornes, Josep Llados, Joan Mas, Joana Maria Pujadas-Mora, & Anna Cabre. (2014). A Bimodal Crowdsourcing Platform for Demographic Historical Manuscripts. In Digital Access to Textual Cultural Heritage Conference (pp. 103–108).
Abstract: In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Novel Learning-free Word Spotting Approach Based on Graph Representation. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 207–211).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.
|
|
|
Claudio Baecchi, Francesco Turchini, Lorenzo Seidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Fisher vectors over random density forest for object recognition. In 22nd International Conference on Pattern Recognition (pp. 4328–4333).
|
|
|
Federico Bartoli, Giuseppe Lisanti, Svebor Karaman, Andrew Bagdanov, & Alberto del Bimbo. (2014). Unsupervised scene adaptation for faster multi- scale pedestrian detection. In 22nd International Conference on Pattern Recognition (pp. 3534–3539).
|
|
|
Svebor Karaman, Giuseppe Lisanti, Andrew Bagdanov, & Alberto del Bimbo. (2014). From re-identification to identity inference: Labeling consistency by local similarity constraints. In Person Re-Identification (Vol. 2, pp. 287–307). Springer London.
Abstract: In this chapter, we introduce the problem of identity inference as a generalization of person re-identification. It is most appropriate to distinguish identity inference from re-identification in situations where a large number of observations must be identified without knowing a priori that groups of test images represent the same individual. The standard single- and multishot person re-identification common in the literature are special cases of our formulation. We present an approach to solving identity inference by modeling it as a labeling problem in a Conditional Random Field (CRF). The CRF model ensures that the final labeling gives similar labels to detections that are similar in feature space. Experimental results are given on the ETHZ, i-LIDS and CAVIAR datasets. Our approach yields state-of-the-art performance for multishot re-identification, and our results on the more general identity inference problem demonstrate that we are able to infer the identity of very many examples even with very few labeled images in the gallery.
Keywords: re-identification; Identity inference; Conditional random fields; Video surveillance
|
|
|
Antonio Hernandez, Stan Sclaroff, & Sergio Escalera. (2014). Contextual rescoring for Human Pose Estimation. In 25th British Machine Vision Conference.
Abstract: A contextual rescoring method is proposed for improving the detection of body joints of a pictorial structure model for human pose estimation. A set of mid-level parts is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body joint hypotheses. A technique is proposed for the automatic discovery of a compact subset of poselets that covers a set of validation images
while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for body joint detections, given its relationship to detections of other body joints and mid-level parts in the image. This new score complements the unary potential of a discriminatively trained pictorial structure model. Experiments on two benchmarks show performance improvements when considering the proposed mid-level image representation and rescoring approach in comparison with other pictorial structure-based approaches.
|
|
|
Cristhian A. Aguilera-Carrasco. (2014). Evaluation of feature detectors and descriptors in VISIBLE-LWIR cross-spectral imaging (Vol. 177). Master's thesis, , .
Abstract: This thesis evaluates the performance of different state-of-art feature detectors and descriptors algorithms in the Visible-LWIR cross-spectral scenario. The focus is to determine if current detector and descriptor algorithms can be used to match features between the LWIR spectrum and the visible spectrum in applications such as, visual odometry, object recognition, image registration and stereo vision. An outdoor cross-spectral dataset was created to evaluate the suitability of the different algorithms. The results
show that the tested algorithms are not suitable to the task of matching features across different spectra. The repeatability ratio was smaller than the 30 percent in the best case and in general matched features were not accurate located. Additionally, these results also suggest that is necessary to create new algorithms that take into account the nature of the different spectra, describing characteristics that exist in both spectra such as discontinuities.
Keywords: Multi-spectral; Cross-spectral; Visible-LWIR imaging; Multimodal.
|
|
|
Xim Cerda-Company, C. Alejandro Parraga, & Xavier Otazu. (2014). Which tone-mapping is the best? A comparative study of tone-mapping perceived quality. In Perception (Vol. 43, 106).
Abstract: Perception 43 ECVP Abstract Supplement
High-dynamic-range (HDR) imaging refers to the methods designed to increase the brightness dynamic range present in standard digital imaging techniques. This increase is achieved by taking the same picture under dierent exposure values and mapping the intensity levels into a single image by way of a tone-mapping operator (TMO). Currently, there is no agreement on how to evaluate the quality
of dierent TMOs. In this work we psychophysically evaluate 15 dierent TMOs obtaining rankings based on the perceived properties of the resulting tone-mapped images. We performed two dierent experiments on a CRT calibrated display using 10 subjects: (1) a study of the internal relationships between grey-levels and (2) a pairwise comparison of the resulting 15 tone-mapped images. In (1) observers internally matched the grey-levels to a reference inside the tone-mapped images and in the real scene. In (2) observers performed a pairwise comparison of the tone-mapped images alongside the real scene. We obtained two rankings of the TMOs according their performance. In (1) the best algorithm
was ICAM by J.Kuang et al (2007) and in (2) the best algorithm was a TMO by Krawczyk et al (2005). Our results also show no correlation between these two rankings.
|
|
|
Sergio Escalera, Xavier Baro, Jordi Gonzalez, Miguel Angel Bautista, Meysam Madadi, Miguel Reyes, et al. (2014). ChaLearn Looking at People Challenge 2014: Dataset and Results. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 459–473). LNCS.
Abstract: This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively.
Keywords: Human Pose Recovery; Behavior Analysis; Action and in- teractions; Multi-modal gestures; recognition
|
|
|
Francisco Cruz, & Oriol Ramos Terrades. (2014). EM-Based Layout Analysis Method for Structured Documents. In 22nd International Conference on Pattern Recognition (pp. 315–320).
Abstract: In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according
to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
|
|
|
Mohammad Rouhani, E. Boyer, & Angel Sappa. (2014). Non-Rigid Registration meets Surface Reconstruction. In International Conference on 3D Vision (pp. 617–624).
Abstract: Non rigid registration is an important task in computer vision with many applications in shape and motion modeling. A fundamental step of the registration is the data association between the source and the target sets. Such association proves difficult in practice, due to the discrete nature of the information and its corruption by various types of noise, e.g. outliers and missing data. In this paper we investigate the benefit of the implicit representations for the non-rigid registration of 3D point clouds. First, the target points are described with small quadratic patches that are blended through partition of unity weighting. Then, the discrete association between the source and the target can be replaced by a continuous distance field induced by the interface. By combining this distance field with a proper deformation term, the registration energy can be expressed in a linear least square form that is easy and fast to solve. This significantly eases the registration by avoiding direct association between points. Moreover, a hierarchical approach can be easily implemented by employing coarse-to-fine representations. Experimental results are provided for point clouds from multi-view data sets. The qualitative and quantitative comparisons show the outperformance and robustness of our framework. %in presence of noise and outliers.
|
|
|
Lluis Pere de las Heras, Ernest Valveny, & Gemma Sanchez. (2014). Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies. In Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 109–121). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a wall segmentation approach in floor plans that is able to work independently to the graphical notation, does not need any pre-annotated data for learning, and is able to segment multiple-shaped walls such as beams and curved-walls. This method results from the combination of the wall segmentation approaches [3, 5] presented recently by the authors. Firstly, potential straight wall segments are extracted in an unsupervised way similar to [3], but restricting even more the wall candidates considered in the original approach. Then, based on [5], these segments are used to learn the texture pattern of walls and spot the lost instances. The presented combination of both methods has been tested on 4 available datasets with different notations and compared qualitatively and quantitatively to the state-of-the-art applied on these collections. Additionally, some qualitative results on floor plans directly downloaded from the Internet are reported in the paper. The overall performance of the method demonstrates either its adaptability to different wall notations and shapes, and to document qualities and resolutions.
Keywords: Graphics recognition; Floor plan analysis; Object segmentation
|
|
|
Lluis Pere de las Heras, David Fernandez, Alicia Fornes, Ernest Valveny, Gemma Sanchez, & Josep Llados. (2014). Runlength Histogram Image Signature for Perceptual Retrieval of Architectural Floor Plans. In Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 135–146). LNCS. Springer Berlin Heidelberg.
Abstract: This paper proposes a runlength histogram signature as a perceptual descriptor of architectural plans in a retrieval scenario. The style of an architectural drawing is characterized by the perception of lines, shapes and texture. Such visual stimuli are the basis for defining semantic concepts as space properties, symmetry, density, etc. We propose runlength histograms extracted in vertical, horizontal and diagonal directions as a characterization of line and space properties in floorplans, so it can be roughly associated to a description of walls and room structure. A retrieval application illustrates the performance of the proposed approach, where given a plan as a query, similar ones are obtained from a database. A ground truth based on human observation has been constructed to validate the hypothesis. Additional retrieval results on sketched building’s facades are reported qualitatively in this paper. Its good description and its adaptability to two different sketch drawings despite its simplicity shows the interest of the proposed approach and opens a challenging research line in graphics recognition.
Keywords: Graphics recognition; Graphics retrieval; Image classification
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). Scene Text Recognition: No Country for Old Men? In 1st International Workshop on Robust Reading.
|
|
|
Xavier Perez Sala, Fernando De la Torre, Laura Igual, Sergio Escalera, & Cecilio Angulo. (2014). Subspace Procrustes Analysis. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 654–668). LNCS.
Abstract: Procrustes Analysis (PA) has been a popular technique to align and build 2-D statistical models of shapes. Given a set of 2-D shapes PA is applied to remove rigid transformations. Then, a non-rigid 2-D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling 2-D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the 2-D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several instances of a 3-D object, SPA computes the mean and a 2-D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased 2-D models by uniformly sampling dierent views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more ecient in space and time. Experiments using SPA to learn 2-D models of bodies from motion capture data illustrate the benets of our approach.
|
|