|
M. Danelljan, Fahad Shahbaz Khan, Michael Felsberg, & Joost Van de Weijer. (2014). Adaptive color attributes for real-time visual tracking. In 27th IEEE Conference on Computer Vision and Pattern Recognition (pp. 1090–1097).
Abstract: Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object
recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally
efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power.
This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional
variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms
state-of-the-art tracking methods while running at more than 100 frames per second.
|
|
|
C. Alejandro Parraga. (2014). Color Vision, Computational Methods for. In Dieter Jaeger, & Ranu Jung (Eds.), Encyclopedia of Computational Neuroscience (pp. 1–11). Springer-Verlag Berlin Heidelberg.
Abstract: The study of color vision has been aided by a whole battery of computational methods that attempt to describe the mechanisms that lead to our perception of colors in terms of the information-processing properties of the visual system. Their scope is highly interdisciplinary, linking apparently dissimilar disciplines such as mathematics, physics, computer science, neuroscience, cognitive science, and psychology. Since the sensation of color is a feature of our brains, computational approaches usually include biological features of neural systems in their descriptions, from retinal light-receptor interaction to subcortical color opponency, cortical signal decoding, and color categorization. They produce hypotheses that are usually tested by behavioral or psychophysical experiments.
Keywords: Color computational vision; Computational neuroscience of color
|
|
|
Adriana Romero, Carlo Gatta, & Gustavo Camps-Valls. (2014). Unsupervised Deep Feature Extraction Of Hyperspectral Images. In 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing.
Abstract: This paper presents an effective unsupervised sparse feature learning algorithm to train deep convolutional networks on hyperspectral images. Deep convolutional hierarchical representations are learned and then used for pixel classification. Features in lower layers present less abstract representations of data, while higher layers represent more abstract and complex characteristics. We successfully illustrate the performance of the extracted representations in a challenging AVIRIS hyperspectral image classification problem, compared to standard dimensionality reduction methods like principal component analysis (PCA) and its kernel counterpart (kPCA). The proposed method largely outperforms the previous state-ofthe-art results on the same experimental setting. Results show that single layer networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels. Regarding the deep architecture, we can conclude that: (1) additional layers in a deep architecture significantly improve the performance w.r.t. single layer variants; (2) the max-pooling step in each layer is mandatory to achieve satisfactory results; and (3) the performance gain w.r.t. the number of layers is upper bounded, since the spatial resolution is reduced at each pooling, resulting in too spatially coarse output features.
Keywords: Convolutional networks; deep learning; sparse learning; feature extraction; hyperspectral image classification
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance. In 22nd International Conference on Pattern Recognition (pp. 3074–3079).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task, especially historical ones. In the paper, we propose a coarse-to-fine handwritten word spotting approach based on graph representation. The presented model comprises both the topological and morphological signatures of the handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. Aiming at developing a practical and efficient word spotting approach for large-scale historical handwritten documents, a fast and coarse comparison is first applied to prune the regions that are not similar to the query based on the graph embedding methodology. Afterwards, the query and regions of interest are compared by graph edit distance based on the Dynamic Time Warping alignment. The proposed approach is evaluated on a public dataset containing 50 pages of historical marriage license records. The results show that the proposed approach achieves a compromise between efficiency and accuracy.
Keywords: word spotting; coarse-to-fine mechamism; graphbased representation; graph embedding; graph edit distance
|
|
|
Alicia Fornes, Josep Llados, Joan Mas, Joana Maria Pujadas-Mora, & Anna Cabre. (2014). A Bimodal Crowdsourcing Platform for Demographic Historical Manuscripts. In Digital Access to Textual Cultural Heritage Conference (pp. 103–108).
Abstract: In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados, & Alicia Fornes. (2014). A Novel Learning-free Word Spotting Approach Based on Graph Representation. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 207–211).
Abstract: Effective information retrieval on handwritten document images has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.
|
|
|
Claudio Baecchi, Francesco Turchini, Lorenzo Seidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Fisher vectors over random density forest for object recognition. In 22nd International Conference on Pattern Recognition (pp. 4328–4333).
|
|
|
Federico Bartoli, Giuseppe Lisanti, Svebor Karaman, Andrew Bagdanov, & Alberto del Bimbo. (2014). Unsupervised scene adaptation for faster multi- scale pedestrian detection. In 22nd International Conference on Pattern Recognition (pp. 3534–3539).
|
|
|
Antonio Hernandez, Stan Sclaroff, & Sergio Escalera. (2014). Contextual rescoring for Human Pose Estimation. In 25th British Machine Vision Conference.
Abstract: A contextual rescoring method is proposed for improving the detection of body joints of a pictorial structure model for human pose estimation. A set of mid-level parts is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body joint hypotheses. A technique is proposed for the automatic discovery of a compact subset of poselets that covers a set of validation images
while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for body joint detections, given its relationship to detections of other body joints and mid-level parts in the image. This new score complements the unary potential of a discriminatively trained pictorial structure model. Experiments on two benchmarks show performance improvements when considering the proposed mid-level image representation and rescoring approach in comparison with other pictorial structure-based approaches.
|
|
|
Francisco Cruz, & Oriol Ramos Terrades. (2014). EM-Based Layout Analysis Method for Structured Documents. In 22nd International Conference on Pattern Recognition (pp. 315–320).
Abstract: In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according
to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
|
|
|
Mohammad Rouhani, E. Boyer, & Angel Sappa. (2014). Non-Rigid Registration meets Surface Reconstruction. In International Conference on 3D Vision (pp. 617–624).
Abstract: Non rigid registration is an important task in computer vision with many applications in shape and motion modeling. A fundamental step of the registration is the data association between the source and the target sets. Such association proves difficult in practice, due to the discrete nature of the information and its corruption by various types of noise, e.g. outliers and missing data. In this paper we investigate the benefit of the implicit representations for the non-rigid registration of 3D point clouds. First, the target points are described with small quadratic patches that are blended through partition of unity weighting. Then, the discrete association between the source and the target can be replaced by a continuous distance field induced by the interface. By combining this distance field with a proper deformation term, the registration energy can be expressed in a linear least square form that is easy and fast to solve. This significantly eases the registration by avoiding direct association between points. Moreover, a hierarchical approach can be easily implemented by employing coarse-to-fine representations. Experimental results are provided for point clouds from multi-view data sets. The qualitative and quantitative comparisons show the outperformance and robustness of our framework. %in presence of noise and outliers.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). Scene Text Recognition: No Country for Old Men? In 1st International Workshop on Robust Reading.
|
|
|
E. Bondi, L. Sidenari, Andrew Bagdanov, & Alberto del Bimbo. (2014). Real-time people counting from depth imagery of crowded environments. In 11th IEEE International Conference on Advanced Video and Signal based Surveillance (pp. 337–342).
Abstract: In this paper we describe a system for automatic people counting in crowded environments. The approach we propose is a counting-by-detection method based on depth imagery. It is designed to be deployed as an autonomous appliance for crowd analysis in video surveillance application scenarios. Our system performs foreground/background segmentation on depth image streams in order to coarsely segment persons, then depth information is used to localize head candidates which are then tracked in time on an automatically estimated ground plane. The system runs in real-time, at a frame-rate of about 20 fps. We collected a dataset of RGB-D sequences representing three typical and challenging surveillance scenarios, including crowds, queuing and groups. An extensive comparative evaluation is given between our system and more complex, Latent SVM-based head localization for person counting applications.
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2014). Spotting Symbol Using Sparsity over Learned Dictionary of Local Descriptors. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 156–160).
Abstract: This paper proposes a new approach to spot symbols into graphical documents using sparse representations. More specifically, a dictionary is learned from a training database of local descriptors defined over the documents. Following their sparse representations, interest points sharing similar properties are used to define interest regions. Using an original adaptation of information retrieval techniques, a vector model for interest regions and for a query symbol is built based on its sparsity in a visual vocabulary where the visual words are columns in the learned dictionary. The matching process is performed comparing the similarity between vector models. Evaluation on SESYD datasets demonstrates that our method is promising.
|
|
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2014). Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 181–185).
Abstract: Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.
|
|