|
Daniel Hernandez and 8 others. 2019. Slanted Stixels: A way to represent steep streets. IJCV, 127, 1643–1658.
Abstract: This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a fully convolutional network, which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.
|
|
|
Fernando Barrera, Felipe Lumbreras and Angel Sappa. 2012. Multimodal Stereo Vision System: 3D Data Extraction and Algorithm Evaluation. J-STSP, 6(5), 437–446.
Abstract: This paper proposes an imaging system for computing sparse depth maps from multispectral images. A special stereo head consisting of an infrared and a color camera defines the proposed multimodal acquisition system. The cameras are rigidly attached so that their image planes are parallel. Details about the calibration and image rectification procedure are provided. Sparse disparity maps are obtained by the combined use of mutual information enriched with gradient information. The proposed approach is evaluated using a Receiver Operating Characteristics curve. Furthermore, a multispectral dataset, color and infrared images, together with their corresponding ground truth disparity maps, is generated and used as a test bed. Experimental results in real outdoor scenarios are provided showing its viability and that the proposed approach is not restricted to a specific domain.
|
|
|
Fadi Dornaika, Jose Manuel Alvarez, Angel Sappa and Antonio Lopez. 2011. A New Framework for Stereo Sensor Pose through Road Segmentation and Registration. TITS, 12(4), 954–966.
Abstract: This paper proposes a new framework for real-time estimation of the onboard stereo head's position and orientation relative to the road surface, which is required for any advanced driver-assistance application. This framework can be used with all road types: highways, urban, etc. Unlike existing works that rely on feature extraction in either the image domain or 3-D space, we propose a framework that directly estimates the unknown parameters from the stream of stereo pairs' brightness. The proposed approach consists of two stages that are invoked for every stereo frame. The first stage segments the road region in one monocular view. The second stage estimates the camera pose using a featureless registration between the segmented monocular road region and the other view in the stereo pair. This paper has two main contributions. The first contribution combines a road segmentation algorithm with a registration technique to estimate the online stereo camera pose. The second contribution solves the registration using a featureless method, which is carried out using two different optimization techniques: 1) the differential evolution algorithm and 2) the Levenberg-Marquardt (LM) algorithm. We provide experiments and evaluations of performance. The results presented show the validity of our proposed framework.
Keywords: road detection
|
|
|
Fernando Barrera, Felipe Lumbreras and Angel Sappa. 2013. Multispectral Piecewise Planar Stereo using Manhattan-World Assumption. PRL, 34(1), 52–61.
Abstract: This paper proposes a new framework for extracting dense disparity maps from a multispectral stereo rig. The system is constructed with an infrared and a color camera. It is intended to explore novel multispectral stereo matching approaches that will allow further extraction of semantic information. The proposed framework consists of three stages. Firstly, an initial sparse disparity map is generated by using a cost function based on feature matching in a multiresolution scheme. Then, by looking at the color image, a set of planar hypotheses is defined to describe the surfaces on the scene. Finally, the previous stages are combined by reformulating the disparity computation as a global minimization problem. The paper has two main contributions. The first contribution combines mutual information with a shape descriptor based on gradient in a multiresolution scheme. The second contribution, which is based on the Manhattan-world assumption, extracts a dense disparity representation using the graph cut algorithm. Experimental results in outdoor scenarios are provided showing the validity of the proposed framework.
Keywords: Multispectral stereo rig; Dense disparity maps from multispectral stereo; Color and infrared images
|
|
|
Iban Berganzo-Besga and 11 others. 2023. Curriculum learning-based strategy for low-density archaeological mound detection from historical maps in India and Pakistan. ScR, 13, 11257.
Abstract: This paper presents two algorithms for the large-scale automatic detection and instance segmentation of potential archaeological mounds on historical maps. Historical maps present a unique source of information for the reconstruction of ancient landscapes. The last 100 years have seen unprecedented landscape modifications with the introduction and large-scale implementation of mechanised agriculture, channel-based irrigation schemes, and urban expansion to name but a few. Historical maps offer a window onto disappearing landscapes where many historical and archaeological elements that no longer exist today are depicted. The algorithms focus on the detection and shape extraction of mound features with high probability of being archaeological settlements, mounds being one of the most commonly documented archaeological features to be found in the Survey of India historical map series, although not necessarily recognised as such at the time of surveying. Mound features with high archaeological potential are most commonly depicted through hachures or contour-equivalent form-lines, therefore, an algorithm has been designed to detect each of those features. Our proposed approach addresses two of the most common issues in archaeological automated survey, the low-density of archaeological features to be detected, and the small amount of training data available. It has been applied to all types of maps available of the historic 1″ to 1-mile series, thus increasing the complexity of the detection. Moreover, the inclusion of synthetic data, along with a Curriculum Learning strategy, has allowed the algorithm to better understand what the mound features look like. Likewise, a series of filters based on topographic setting, form, and size have been applied to improve the accuracy of the models. The resulting algorithms have a recall value of 52.61% and a precision of 82.31% for the hachure mounds, and a recall value of 70.80% and a precision of 70.29% for the form-line mounds, which allowed the detection of nearly 6000 mound features over an area of 470,500 km2, the largest such approach to have ever been applied. If we restrict our focus to the maps most similar to those used in the algorithm training, we reach recall values greater than 60% and precision values greater than 90%. This approach has shown the potential to implement an adaptive algorithm that allows, after a small amount of retraining with data detected from a new map, a better general mound feature detection in the same map.
|
|
|
Marçal Rusiñol, J. Chazalon and Katerine Diaz. 2018. Augmented Songbook: an Augmented Reality Educational Application for Raising Music Awareness. MTAP, 77(11), 13773–13798.
Abstract: This paper presents the development of an Augmented Reality mobile application which aims at sensibilizing young children to abstract concepts of music. Such concepts are, for instance, the musical notation or the idea of rhythm. Recent studies in Augmented Reality for education suggest that such technologies have multiple benefits for students, including younger ones. As mobile document image acquisition and processing gains maturity on mobile platforms, we explore how it is possible to build a markerless and real-time application to augment the physical documents with didactic animations and interactive virtual content. Given a standard image processing pipeline, we compare the performance of different local descriptors at two key stages of the process. Results suggest alternatives to the SIFT local descriptors, regarding result quality and computational efficiency, both for document model identification and perspective transform estimation. All experiments are performed on an original and public dataset we introduce here.
Keywords: Augmented reality; Document image matching; Educational applications
|
|
|
Monica Piñol, Angel Sappa and Ricardo Toledo. 2015. Adaptive Feature Descriptor Selection based on a Multi-Table Reinforcement Learning Strategy. NEUCOM, 150(A), 106–115.
Abstract: This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach.
Keywords: Reinforcement learning; Q-learning; Bag of features; Descriptors
|
|
|
Angel Sappa, Fadi Dornaika, Daniel Ponsa, David Geronimo and Antonio Lopez. 2008. An Efficient Approach to Onboard Stereo Vision System Pose Estimation. TITS, 9(3), 476–490.
Abstract: This paper presents an efficient technique for estimating the pose of an onboard stereo vision system relative to the environment’s dominant surface area, which is supposed to be the road surface. Unlike previous approaches, it can be used either for urban or highway scenarios since it is not based on a specific visual traffic feature extraction but on 3-D raw data points. The whole process is performed in the Euclidean space and consists of two stages. Initially, a compact 2-D representation of the original 3-D data points is computed. Then, a RANdom SAmple Consensus (RANSAC) based least-squares approach is used to fit a plane to the road. Fast RANSAC fitting is obtained by selecting points according to a probability function that takes into account the density of points at a given depth. Finally, stereo camera height and pitch angle are computed related to the fitted road plane. The proposed technique is intended to be used in driverassistance systems for applications such as vehicle or pedestrian detection. Experimental results on urban environments, which are the most challenging scenarios (i.e., flat/uphill/downhill driving, speed bumps, and car’s accelerations), are presented. These results are validated with manually annotated ground truth. Additionally, comparisons with previous works are presented to show the improvements in the central processing unit processing time, as well as in the accuracy of the obtained results.
Keywords: Camera extrinsic parameter estimation, ground plane estimation, onboard stereo vision system
|
|
|
Iban Berganzo-Besga, Hector A. Orengo, Felipe Lumbreras, Paloma Aliende and Monica N. Ramsey. 2022. Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms. JArchSci, 148, 105654.
Abstract: This paper presents an algorithm for automated detection and classification of multi-cell phytoliths, one of the major components of many archaeological and paleoenvironmental deposits. This identification, based on phytolith wave pattern, is made using a pretrained VGG19 deep learning model. This approach has been tested in three key phytolith genera for the study of agricultural origins in Near East archaeology: Avena, Hordeum and Triticum. Also, this classification has been validated at species-level using Triticum boeoticum and dicoccoides images. Due to the diversity of microscopes, cameras and chemical treatments that can influence images of phytolith slides, three types of data augmentation techniques have been implemented: rotation of the images at 45-degree angles, random colour and brightness jittering, and random blur/sharpen. The implemented workflow has resulted in an overall accuracy of 93.68% for phytolith genera, improving previous attempts. The algorithm has also demonstrated its potential to automatize the classification of phytoliths species with an overall accuracy of 100%. The open code and platforms employed to develop the algorithm assure the method's accessibility, reproducibility and reusability.
|
|
|
Arnau Ramisa, Adriana Tapus, David Aldavert, Ricardo Toledo and Ramon Lopez de Mantaras. 2009. Robust Vision-Based Localization using Combinations of Local Feature Regions Detectors. AR, 27(4), 373–385.
Abstract: This paper presents a vision-based approach for mobile robot localization. The model of the environment is topological. The new approach characterizes a place using a signature. This signature consists of a constellation of descriptors computed over different types of local affine covariant regions extracted from an omnidirectional image acquired rotating a standard camera with a pan-tilt unit. This type of representation permits a reliable and distinctive environment modelling. Our objectives were to validate the proposed method in indoor environments and, also, to find out if the combination of complementary local feature region detectors improves the localization versus using a single region detector. Our experimental results show that if false matches are effectively rejected, the combination of different covariant affine region detectors increases notably the performance of the approach by combining the different strengths of the individual detectors. In order to reduce the localization time, two strategies are evaluated: re-ranking the map nodes using a global similarity measure and using standard perspective view field of 45°.
In order to systematically test topological localization methods, another contribution proposed in this work is a novel method to see the degradation in localization performance as the robot moves away from the point where the original signature was acquired. This allows to know the robustness of the proposed signature. In order for this to be effective, it must be done in several, variated, environments that test all the possible situations in which the robot may have to perform localization.
|
|