German Ros, Sebastian Ramos, Manuel Granados, Amir Bakhtiary, David Vazquez, & Antonio Lopez. (2015). Vision-based Offline-Online Perception Paradigm for Autonomous Driving. In IEEE Winter Conference on Applications of Computer Vision (pp. 231–238).
Abstract: Autonomous driving is a key factor for future mobility. Properly perceiving the environment of the vehicles is essential for a safe driving, which requires computing accurate geometric and semantic information in real-time. In this paper, we challenge state-of-the-art computer vision algorithms for building a perception system for autonomous driving. An inherent drawback in the computation of visual semantics is the trade-off between accuracy and computational cost. We propose to circumvent this problem by following an offline-online strategy. During the offline stage dense 3D semantic maps are created. In the online stage the current driving area is recognized in the maps via a re-localization process, which allows to retrieve the pre-computed accurate semantics and 3D geometry in realtime. Then, detecting the dynamic obstacles we obtain a rich understanding of the current scene. We evaluate quantitatively our proposal in the KITTI dataset and discuss the related open challenges for the computer vision community.
Keywords: Autonomous Driving; Scene Understanding; SLAM; Semantic Segmentation
|
Frederic Sampedro, & Sergio Escalera. (2015). Spatial codification of label predictions in Multi-scale Stacked Sequential Learning: A case study on multi-class medical volume segmentation. IETCV - IET Computer Vision, 9(3), 439–446.
Abstract: In this study, the authors propose the spatial codification of label predictions within the multi-scale stacked sequential learning (MSSL) framework, a successful learning scheme to deal with non-independent identically distributed data entries. After providing a motivation for this objective, they describe its theoretical framework based on the introduction of the blurred shape model as a smart descriptor to codify the spatial distribution of the predicted labels and define the new extended feature set for the second stacked classifier. They then particularise this scheme to be applied in volume segmentation applications. Finally, they test the implementation of the proposed framework in two medical volume segmentation datasets, obtaining significant performance improvements (with a 95% of confidence) in comparison to standard Adaboost classifier and classical MSSL approaches.
|
Fahad Shahbaz Khan, Muhammad Anwer Rao, Joost Van de Weijer, Michael Felsberg, & J.Laaksonen. (2015). Deep semantic pyramids for human attributes and action recognition. In Image Analysis, Proceedings of 19th Scandinavian Conference , SCIA 2015 (Vol. 9127, pp. 341–353). Springer International Publishing.
Abstract: Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features.
We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of 17.2%, 13.9%, 24.3% and 22.6% compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.
Keywords: Action recognition; Human attributes; Semantic pyramids
|
Ivan Huerta, Michael Holte, Thomas B. Moeslund, & Jordi Gonzalez. (2015). Chromatic shadow detection and tracking for moving foreground segmentation. IMAVIS - Image and Vision Computing, 41, 42–53.
Abstract: Advanced segmentation techniques in the surveillance domain deal with shadows to avoid distortions when detecting moving objects. Most approaches for shadow detection are still typically restricted to penumbra shadows and cannot cope well with umbra shadows. Consequently, umbra shadow regions are usually detected as part of moving objects, thus aecting the performance of the nal detection. In this paper we address the detection of both penumbra and umbra shadow regions. First, a novel bottom-up approach is presented based on gradient and colour models, which successfully discriminates between chromatic moving cast shadow regions and those regions detected as moving objects. In essence, those regions corresponding to potential shadows are detected based on edge partitioning and colour statistics. Subsequently (i) temporal similarities between textures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for each potential shadow region for detecting the umbra shadow regions. Our second contribution renes even further the segmentation results: a tracking-based top-down approach increases the performance of our bottom-up chromatic shadow detection algorithm by properly correcting non-detected shadows.
To do so, a combination of motion lters in a data association framework exploits the temporal consistency between objects and shadows to increase
the shadow detection rate. Experimental results exceed current state-of-the-
art in shadow accuracy for multiple well-known surveillance image databases which contain dierent shadowed materials and illumination conditions.
Keywords: Detecting moving objects; Chromatic shadow detection; Temporal local gradient; Spatial and Temporal brightness and angle distortions; Shadow tracking
|
Miguel Oliveira, Victor Santos, & Angel Sappa. (2015). Multimodal Inverse Perspective Mapping. IF - Information Fusion, 24, 108–121.
Abstract: Over the past years, inverse perspective mapping has been successfully applied to several problems in the field of Intelligent Transportation Systems. In brief, the method consists of mapping images to a new coordinate system where perspective effects are removed. The removal of perspective associated effects facilitates road and obstacle detection and also assists in free space estimation. There is, however, a significant limitation in the inverse perspective mapping: the presence of obstacles on the road disrupts the effectiveness of the mapping. The current paper proposes a robust solution based on the use of multimodal sensor fusion. Data from a laser range finder is fused with images from the cameras, so that the mapping is not computed in the regions where obstacles are present. As shown in the results, this considerably improves the effectiveness of the algorithm and reduces computation time when compared with the classical inverse perspective mapping. Furthermore, the proposed approach is also able to cope with several cameras with different lenses or image resolutions, as well as dynamic viewpoints.
Keywords: Inverse perspective mapping; Multimodal sensor fusion; Intelligent vehicles
|
Mariella Dimiccoli, & Petia Radeva. (2015). Lifelogging in the era of outstanding digitization. In International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage.
Abstract: In this paper, we give an overview on the emerging trend of the digitized self, focusing on visual lifelogging through wearable cameras. This is about continuously recording our life from a first-person view by wearing a camera that passively captures images. On one hand, visual lifelogging has opened the door to a large number of applications, including health. On the other, it has also boosted new challenges in the field of data analysis as well as new ethical concerns. While currently increasing efforts are being devoted to exploit lifelogging data for the improvement of personal well-being, we believe there are still many interesting applications to explore, ranging from tourism to the digitization of human behavior.
|
Miguel Oliveira, L. Seabra Lopes, G. Hyun Lim, S. Hamidreza Kasaei, Angel Sappa, & A. Tom. (2015). Concurrent Learning of Visual Codebooks and Object Categories in Openended Domains. In International Conference on Intelligent Robots and Systems (pp. 2488–2495).
Abstract: In open-ended domains, robots must continuously learn new object categories. When the training sets are created offline, it is not possible to ensure their representativeness with respect to the object categories and features the system will find when operating online. In the Bag of Words model, visual codebooks are constructed from training sets created offline. This might lead to non-discriminative visual words and, as a consequence, to poor recognition performance. This paper proposes a visual object recognition system which concurrently learns in an incremental and online fashion both the visual object category representations as well as the codebook words used to encode them. The codebook is defined using Gaussian Mixture Models which are updated using new object views. The approach contains similarities with the human visual object recognition system: evidence suggests that the development of recognition capabilities occurs on multiple levels and is sustained over large periods of time. Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using offline constructed codebooks.
Keywords: Visual Learning; Computer Vision; Autonomous Agents
|
G.Blasco, Simone Balocco, J.Puig, J.Sanchez-Gonzalez, W.Ricart, J.Daunis-I-Estadella, et al. (2015). Carotid pulse wave velocity by magnetic resonance imaging is increased in middle-aged subjects with the metabolic syndrome. ICJI - International Journal of Cardiovascular Imaging, 31(3), 603–612.
Abstract: Arterial pulse wave velocity (PWV), an independent predictor of cardiovascular disease, physiologically increases with age; however, growing evidence suggests metabolic syndrome (MetS) accelerates this increase. Magnetic resonance imaging (MRI) enables reliable noninvasive assessment of arterial stiffness by measuring arterial PWV in specific vascular segments. We investigated the association between the presence of MetS and its components with carotid PWV (cPWV) in asymptomatic subjects without diabetes. We assessed cPWV by MRI in 61 individuals (mean age, 55.3 ± 14.1 years; median age, 55 years): 30 with MetS and 31 controls with similar age, sex, body mass index, and LDL-cholesterol levels. The study population was dichotomized by the median age. To remove the physiological association between PWV and age, unpaired t tests and multiple regression analyses were performed using the residuals of the regression between PWV and age. cPWV was higher in middle-aged subjects with MetS than in those without (p = 0.001), but no differences were found in elder subjects (p = 0.313). cPWV was associated with diastolic blood pressure (r = 0.276, p = 0.033) and waist circumference (r = 0.268, p = 0.038). The presence of MetS was associated with increased cPWV regardless of age, sex, blood pressure, and waist (p = 0.007). The MetS components contributing independently to an increased cPWV were hypertension (p = 0.018) and hypertriglyceridemia (p = 0.002). The presence of MetS is associated with an increased cPWV in middle-aged subjects. In particular, hypertension and hypertriglyceridemia may contribute to early progression of carotid stiffness.
Keywords: Metabolic syndrome; Arterial stiffness; Pulse wave velocity; Carotid artery; Magnetic resonance
|
Carles Sanchez, Jorge Bernal, F. Javier Sanchez, Antoni Rosell, Marta Diez-Ferrer, & Debora Gil. (2015). Towards On-line Quantification of Tracheal Stenosis from Videobronchoscopy. IJCAR - International Journal of Computer Assisted Radiology and Surgery, 10(6), 935–945.
|
Wenjuan Gong, W.Zhang, Jordi Gonzalez, Y.Ren, & Z.Li. (2015). Enhanced Asymmetric Bilinear Model for Face Recognition. IJDSN - International Journal of Distributed Sensor Networks, , Article ID 218514.
Abstract: Bilinear models have been successfully applied to separate two factors, for example, pose variances and different identities in face recognition problems. Asymmetric model is a type of bilinear model which models a system in the most concise way. But seldom there are works exploring the applications of asymmetric bilinear model on face recognition problem with illumination changes. In this work, we propose enhanced asymmetric model for illumination-robust face recognition. Instead of initializing the factor probabilities randomly, we initialize them with nearest neighbor method and optimize them for the test data. Above that, we update the factor model to be identified. We validate the proposed method on a designed data sample and extended Yale B dataset. The experiment results show that the enhanced asymmetric models give promising results and good recognition accuracies.
|
Aura Hernandez-Sabate, Meritxell Joanpere, Nuria Gorgorio, & Lluis Albarracin. (2015). Mathematics learning opportunities when playing a Tower Defense Game. IJSG - International Journal of Serious Games, 57–71.
Abstract: A qualitative research study is presented herein with the purpose of identifying mathematics learning opportunities in students between 10 and 12 years old while playing a commercial version of a Tower Defense game. These learning opportunities are understood as mathematicisable moments of the game and involve the establishment of relationships between the game and mathematical problem solving. Based on the analysis of these mathematicisable moments, we conclude that the game can promote problem-solving processes and learning opportunities that can be associated with different mathematical contents that appears in mathematics curricula, thought it seems that teacher or new game elements might be needed to facilitate the processes.
Keywords: Tower Defense game; learning opportunities; mathematics; problem solving; game design
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles, & Gemma Sanchez. (2015). CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR - International Journal on Document Analysis and Recognition, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
Christophe Rigaud, Clement Guerin, Dimosthenis Karatzas, Jean-Christophe Burie, & Jean-Marc Ogier. (2015). Knowledge-driven understanding of images in comic books. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 199–221.
Abstract: Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.
Keywords: Document Understanding; comics analysis; expert system
|
David Aldavert, Marçal Rusiñol, Ricardo Toledo, & Josep Llados. (2015). A Study of Bag-of-Visual-Words Representations for Handwritten Keyword Spotting. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 223–234.
Abstract: The Bag-of-Visual-Words (BoVW) framework has gained popularity among the document image analysis community, specifically as a representation of handwritten words for recognition or spotting purposes. Although in the computer vision field the BoVW method has been greatly improved, most of the approaches in the document image analysis domain still rely on the basic implementation of the BoVW method disregarding such latest refinements. In this paper, we present a review of those improvements and its application to the keyword spotting task. We thoroughly evaluate their impact against a baseline system in the well-known George Washington dataset and compare the obtained results against nine state-of-the-art keyword spotting methods. In addition, we also compare both the baseline and improved systems with the methods presented at the Handwritten Keyword Spotting Competition 2014.
Keywords: Bag-of-Visual-Words; Keyword spotting; Handwritten documents; Performance evaluation
|
Dan Norton, Fernando Vilariño, & Onur Ferhat. (2015). Memory Field – Creative Engagement in Digital Collections. In Internet Librarian International Conference.
Abstract: “Memory Fields” is a trans-disciplinary project aiming at the (re)valorisation of digital collections.Its main deliverable is an interface for a dual screen installation, used to access and mix the public library digital collections. The collections being used in this case are a collection of digitised posters from the Spanish Civil War, belonging to the Arxiu General de Catalunya, and a collection of field recordings made by Dan Norton. The system generates visualisations, and the images and sounds are mixed together using narrative primitives of video dj. Users contribute to the digital collections by adding personal memories and observations. The comments and recollections appear as flowers growing in a “memory field” and memories remain public in a Twitter feed (@Memoryfields).
|