|
Christophe Rigaud, Clement Guerin, Dimosthenis Karatzas, Jean-Christophe Burie, & Jean-Marc Ogier. (2015). Knowledge-driven understanding of images in comic books. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 199–221.
Abstract: Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.
Keywords: Document Understanding; comics analysis; expert system
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles, & Gemma Sanchez. (2015). CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR - International Journal on Document Analysis and Recognition, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
|
|
Joan M. Nuñez, Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2015). Growing Algorithm for Intersection Detection (GRAID) in branching patterns. MVAP - Machine Vision and Applications, 26(2), 387–400.
Abstract: Analysis of branching structures represents a very important task in fields such as medical diagnosis, road detection or biometrics. Detecting intersection landmarks Becomes crucial when capturing the structure of a branching pattern. We present a very simple geometrical model to describe intersections in branching structures based on two conditions: Bounded Tangency condition (BT) and Shortest Branch (SB) condition. The proposed model precisely sets a geometrical characterization of intersections and allows us to introduce a new unsupervised operator for intersection extraction. We propose an implementation that handles the consequences of digital domain operation that,unlike existing approaches, is not restricted to a particular scale and does not require the computation of the thinned pattern. The new proposal, as well as other existing approaches in the bibliography, are evaluated in a common framework for the first time. The performance analysis is based on two manually segmented image data sets: DRIVE retinal image database and COLON-VESSEL data set, a newly created data set of vascular content in colonoscopy frames. We have created an intersection landmark ground truth for each data set besides comparing our method in the only existing ground truth. Quantitative results confirm that we are able to outperform state-of-the-art performancelevels with the advantage that neither training nor parameter tuning is needed.
Keywords: Bifurcation ; Crossroad; Intersection ;Retina ; Vessel
|
|
|
Debora Gil, F. Javier Sanchez, Gloria Fernandez Esparrach, & Jorge Bernal. (2015). 3D Stable Spatio-temporal Polyp Localization in Colonoscopy Videos. In Computer-Assisted and Robotic Endoscopy. Revised selected papers of Second International Workshop, CARE 2015, Held in Conjunction with MICCAI 2015 (Vol. 9515, pp. 140–152). LNCS.
Abstract: Computational intelligent systems could reduce polyp miss rate in colonoscopy for colon cancer diagnosis and, thus, increase the efficiency of the procedure. One of the main problems of existing polyp localization methods is a lack of spatio-temporal stability in their response. We propose to explore the response of a given polyp localization across temporal windows in order to select
those image regions presenting the highest stable spatio-temporal response.
Spatio-temporal stability is achieved by extracting 3D watershed regions on the
temporal window. Stability in localization response is statistically determined by analysis of the variance of the output of the localization method inside each 3D region. We have explored the benefits of considering spatio-temporal stability in two different tasks: polyp localization and polyp detection. Experimental results indicate an average improvement of 21:5% in polyp localization and 43:78% in polyp detection.
Keywords: Colonoscopy, Polyp Detection, Polyp Localization, Region Extraction, Watersheds
|
|
|
Hanne Kause, Aura Hernandez-Sabate, Patricia Marquez, Andrea Fuster, Luc Florack, Hans van Assen, et al. (2015). Confidence Measures for Assessing the HARP Algorithm in Tagged Magnetic Resonance Imaging. In Statistical Atlases and Computational Models of the Heart. Revised selected papers of Imaging and Modelling Challenges 6th International Workshop, STACOM 2015, Held in Conjunction with MICCAI 2015 (Vol. 9534, pp. 69–79). LNCS. Springer International Publishing.
Abstract: Cardiac deformation and changes therein have been linked to pathologies. Both can be extracted in detail from tagged Magnetic Resonance Imaging (tMRI) using harmonic phase (HARP) images. Although point tracking algorithms have shown to have high accuracies on HARP images, these vary with position. Detecting and discarding areas with unreliable results is crucial for use in clinical support systems. This paper assesses the capability of two confidence measures (CMs), based on energy and image structure, for detecting locations with reduced accuracy in motion tracking results. These CMs were tested on a database of simulated tMRI images containing the most common artifacts that may affect tracking accuracy. CM performance is assessed based on its capability for HARP tracking error bounding and compared in terms of significant differences detected using a multi comparison analysis of variance that takes into account the most influential factors on HARP tracking performance. Results showed that the CM based on image structure was better suited to detect unreliable optical flow vectors. In addition, it was shown that CMs can be used to detect optical flow vectors with large errors in order to improve the optical flow obtained with the HARP tracking algorithm.
|
|
|
Aleksandr Setkov, Fabio Martinez Carillo, Michele Gouiffes, Christian Jacquemin, Maria Vanrell, & Ramon Baldrich. (2015). DAcImPro: A Novel Database of Acquired Image Projections and Its Application to Object Recognition. In Advances in Visual Computing. Proceedings of 11th International Symposium, ISVC 2015 Part II (Vol. 9475, pp. 463–473). LNCS. Springer International Publishing.
Abstract: Projector-camera systems are designed to improve the projection quality by comparing original images with their captured projections, which is usually complicated due to high photometric and geometric variations. Many research works address this problem using their own test data which makes it extremely difficult to compare different proposals. This paper has two main contributions. Firstly, we introduce a new database of acquired image projections (DAcImPro) that, covering photometric and geometric conditions and providing data for ground-truth computation, can serve to evaluate different algorithms in projector-camera systems. Secondly, a new object recognition scenario from acquired projections is presented, which could be of a great interest in such domains, as home video projections and public presentations. We show that the task is more challenging than the classical recognition problem and thus requires additional pre-processing, such as color compensation or projection area selection.
Keywords: Projector-camera systems; Feature descriptors; Object recognition
|
|
|
Miguel Oliveira, Victor Santos, Angel Sappa, & P. Dias. (2015). Scene Representations for Autonomous Driving: an approach based on polygonal primitives. In 2nd Iberian Robotics Conference ROBOT2015 (Vol. 417, pp. 503–515).
Abstract: In this paper, we present a novel methodology to compute a 3D scene
representation. The algorithm uses macro scale polygonal primitives to model the scene. This means that the representation of the scene is given as a list of large scale polygons that describe the geometric structure of the environment. Results show that the approach is capable of producing accurate descriptions of the scene. In addition, the algorithm is very efficient when compared to other techniques.
Keywords: Scene reconstruction; Point cloud; Autonomous vehicles
|
|
|
Marta Nuñez-Garcia, Sonja Simpraga, M.Angeles Jurado, Maite Garolera, Roser Pueyo, & Laura Igual. (2015). FADR: Functional-Anatomical Discriminative Regions for rest fMRI Characterization. In Machine Learning in Medical Imaging, Proceedings of 6th International Workshop, MLMI 2015, Held in Conjunction with MICCAI 2015 (pp. 61–68).
|
|
|
Marc Bolaños, Maite Garolera, & Petia Radeva. (2015). Object Discovery using CNN Features in Egocentric Videos. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 67–74). LNCS.
Abstract: Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.
Keywords: Object discovery; Egocentric videos; Lifelogging; CNN
|
|
|
Suman Ghosh, & Ernest Valveny. (2015). A Sliding Window Framework for Word Spotting Based on Word Attributes. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 652–661). LNCS. Springer International Publishing.
Abstract: In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
Keywords: Word spotting; Sliding window; Word attributes
|
|
|
Onur Ferhat, Arcadi Llanza, & Fernando Vilariño. (2015). A Feature-Based Gaze Estimation Algorithm for Natural Light Scenarios. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 569–576). LNCS. Springer International Publishing.
Abstract: We present an eye tracking system that works with regular webcams. We base our work on open source CVC Eye Tracker [7] and we propose a number of improvements and a novel gaze estimation method. The new method uses features extracted from iris segmentation and it does not fall into the traditional categorization of appearance–based/model–based methods. Our experiments show that our approach reduces the gaze estimation errors by 34 % in the horizontal direction and by 12 % in the vertical direction compared to the baseline system.
Keywords: Eye tracking; Gaze estimation; Natural light; Webcam
|
|
|
Alejandro Gonzalez Alzate, Gabriel Villalonga, German Ros, David Vazquez, & Antonio Lopez. (2015). 3D-Guided Multiscale Sliding Window for Pedestrian Detection. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 560–568).
Abstract: The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified as containing a pedestrian or not. Much attention has being paid to the classification module, while candidate generation has mainly relied on (multiscale) sliding window pyramid. However, candidate generation is critical for achieving real-time. In this paper we assume a context of autonomous driving based on stereo vision. Accordingly, we evaluate the effect of taking into account the 3D information (derived from the stereo) in order to prune the hundred of thousands windows per image generated by classical pyramidal sliding window. For our study we use a multimodal (RGB, disparity) and multi-descriptor (HOG, LBP, HOG+LBP) holistic ensemble based on linear SVM. Evaluation on data from the challenging KITTI benchmark suite shows the effectiveness of using 3D information to dramatically reduce the number of candidate windows, even improving the overall pedestrian detection accuracy.
Keywords: Pedestrian Detection
|
|
|
Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, & Petia Radeva. (2015). R-clustering for egocentric video segmentation. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 327–336). LNCS. Springer International Publishing.
Abstract: In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Keywords: Temporal video segmentation; Egocentric videos; Clustering
|
|
|
Alejandro Gonzalez Alzate, Sebastian Ramos, David Vazquez, Antonio Lopez, & Jaume Amores. (2015). Spatiotemporal Stacked Sequential Learning for Pedestrian Detection. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (pp. 3–12).
Abstract: Pedestrian classifiers decide which image windows contain a pedestrian. In practice, such classifiers provide a relatively high response at neighbor windows overlapping a pedestrian, while the responses around potential false positives are expected to be lower. An analogous reasoning applies for image sequences. If there is a pedestrian located within a frame, the same pedestrian is expected to appear close to the same location in neighbor frames. Therefore, such a location has chances of receiving high classification scores during several frames, while false positives are expected to be more spurious. In this paper we propose to exploit such correlations for improving the accuracy of base pedestrian classifiers. In particular, we propose to use two-stage classifiers which not only rely on the image descriptors required by the base classifiers but also on the response of such base classifiers in a given spatiotemporal neighborhood. More specifically, we train pedestrian classifiers using a stacked sequential learning (SSL) paradigm. We use a new pedestrian dataset we have acquired from a car to evaluate our proposal at different frame rates. We also test on a well known dataset: Caltech. The obtained results show that our SSL proposal boosts detection accuracy significantly with a minimal impact on the computational cost. Interestingly, SSL improves more the accuracy at the most dangerous situations, i.e. when a pedestrian is close to the camera.
Keywords: SSL; Pedestrian Detection
|
|
|
Julie Digne, Mariella Dimiccoli, Neus Sabater, & Philippe Salembier. (2015). Neighborhood Filters and the Recovery of 3D Information. In Handbook of Mathematical Methods in Imaging (pp. 1645–1673). Springer New York.
Abstract: Following their success in image processing (see Chapter Local Smoothing Neighborhood Filters), neighborhood filters have been extended to 3D surface processing. This adaptation is not straightforward. It has led to several variants for surfaces depending on whether the surface is defined as a mesh, or as a raw data point set. The image gray level in the bilateral similarity measure is replaced by a geometric information such as the normal or the curvature. The first section of this chapter reviews the variants of 3D mesh bilateral filters and compares them to the simplest possible isotropic filter, the mean curvature motion.In a second part, this chapter reviews applications of the bilateral filter to a data composed of a sparse depth map (or of depth cues) and of the image on which they have been computed. Such sparse depth cues can be obtained by stereovision or by psychophysical techniques. The underlying assumption to these applications is that pixels with similar intensity around a region are likely to have similar depths. Therefore, when diffusing depth information with a bilateral filter based on locality and color similarity, the discontinuities in depth are assured to be consistent with the color discontinuities, which is generally a desirable property. In the reviewed applications, this ends up with the reconstruction of a dense perceptual depth map from the joint data of an image and of depth cues.
|
|