|
Daniel Hernandez and 8 others. 2019. Slanted Stixels: A way to represent steep streets. IJCV, 127, 1643–1658.
Abstract: This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a fully convolutional network, which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.
|
|
|
Antonio Lopez, Joan Serrat, Cristina Cañero, Felipe Lumbreras and T. Graf. 2010. Robust lane markings detection and road geometry computation. IJAT, 11(3), 395–407.
Abstract: Detection of lane markings based on a camera sensor can be a low-cost solution to lane departure and curve-over-speed warnings. A number of methods and implementations have been reported in the literature. However, reliable detection is still an issue because of cast shadows, worn and occluded markings, variable ambient lighting conditions, for example. We focus on increasing detection reliability in two ways. First, we employed an image feature other than the commonly used edges: ridges, which we claim addresses this problem better. Second, we adapted RANSAC, a generic robust estimation method, to fit a parametric model of a pair of lane lines to the image features, based on both ridgeness and ridge orientation. In addition, the model was fitted for the left and right lane lines simultaneously to enforce a consistent result. Four measures of interest for driver assistance applications were directly computed from the fitted parametric model at each frame: lane width, lane curvature, and vehicle yaw angle and lateral offset with regard the lane medial axis. We qualitatively assessed our method in video sequences captured on several road types and under very different lighting conditions. We also quantitatively assessed it on synthetic but realistic video sequences for which road geometry and vehicle trajectory ground truth are known.
Keywords: lane markings
|
|
|
Miguel Oliveira, Victor Santos and Angel Sappa. 2015. Multimodal Inverse Perspective Mapping. IF, 24, 108–121.
Abstract: Over the past years, inverse perspective mapping has been successfully applied to several problems in the field of Intelligent Transportation Systems. In brief, the method consists of mapping images to a new coordinate system where perspective effects are removed. The removal of perspective associated effects facilitates road and obstacle detection and also assists in free space estimation. There is, however, a significant limitation in the inverse perspective mapping: the presence of obstacles on the road disrupts the effectiveness of the mapping. The current paper proposes a robust solution based on the use of multimodal sensor fusion. Data from a laser range finder is fused with images from the cameras, so that the mapping is not computed in the regions where obstacles are present. As shown in the results, this considerably improves the effectiveness of the algorithm and reduces computation time when compared with the classical inverse perspective mapping. Furthermore, the proposed approach is also able to cope with several cameras with different lenses or image resolutions, as well as dynamic viewpoints.
Keywords: Inverse perspective mapping; Multimodal sensor fusion; Intelligent vehicles
|
|
|
Fadi Dornaika and Angel Sappa. 2009. A Featureless and Stochastic Approach to On-board Stereo Vision System Pose. IMAVIS, 27(9), 1382–1393.
Abstract: This paper presents a direct and stochastic technique for real-time estimation of on-board stereo head’s position and orientation. Unlike existing works which rely on feature extraction either in the image domain or in 3D space, our proposed approach directly estimates the unknown parameters from the stream of stereo pairs’ brightness. The pose parameters are tracked using the particle filtering framework which implicitly enforces the smoothness constraints on the estimated parameters. The proposed technique can be used with a driver assistance applications as well as with augmented reality applications. Extended experiments on urban environments with different road geometries are presented. Comparisons with a 3D data-based approach are presented. Moreover, we provide a performance study aiming at evaluating the accuracy of the proposed approach.
Keywords: On-board stereo vision system; Pose estimation; Featureless approach; Particle filtering; Image warping
|
|
|
Carme Julia, Angel Sappa, Felipe Lumbreras, Joan Serrat and Antonio Lopez. 2010. An Iterative Multiresolution Scheme for SFM with Missing Data: single and multiple object scenes. IMAVIS, 28(1), 164–176.
Abstract: Most of the techniques proposed for tackling the Structure from Motion problem (SFM) cannot deal with high percentages of missing data in the matrix of trajectories. Furthermore, an additional problem should be faced up when working with multiple object scenes: the rank of the matrix of trajectories should be estimated. This paper presents an iterative multiresolution scheme for SFM with missing data to be used in both the single and multiple object cases. The proposed scheme aims at recovering missing entries in the original input matrix. The objective is to improve the results by applying a factorization technique to the partially or totally filled in matrix instead of to the original input one. Experimental results obtained with synthetic and real data sequences, containing single and multiple objects, are presented to show the viability of the proposed approach.
|
|
|
Antonio Lopez and 7 others. 2017. Training my car to see using virtual worlds. IMAVIS, 38, 102–118.
Abstract: Computer vision technologies are at the core of different advanced driver assistance systems (ADAS) and will play a key role in oncoming autonomous vehicles too. One of the main challenges for such technologies is to perceive the driving environment, i.e. to detect and track relevant driving information in a reliable manner (e.g. pedestrians in the vehicle route, free space to drive through). Nowadays it is clear that machine learning techniques are essential for developing such a visual perception for driving. In particular, the standard working pipeline consists of collecting data (i.e. on-board images), manually annotating the data (e.g. drawing bounding boxes around pedestrians), learning a discriminative data representation taking advantage of such annotations (e.g. a deformable part-based model, a deep convolutional neural network), and then assessing the reliability of such representation with the acquired data. In the last two decades most of the research efforts focused on representation learning (first, designing descriptors and learning classifiers; later doing it end-to-end). Hence, collecting data and, especially, annotating it, is essential for learning good representations. While this has been the case from the very beginning, only after the disruptive appearance of deep convolutional neural networks that it became a serious issue due to their data hungry nature. In this context, the problem is that manual data annotation is a tiresome work prone to errors. Accordingly, in the late 00’s we initiated a research line consisting of training visual models using photo-realistic computer graphics, especially focusing on assisted and autonomous driving. In this paper, we summarize such a work and show how it has become a new tendency with increasing acceptance.
|
|
|
Aura Hernandez-Sabate, Debora Gil, Jaume Garcia and Enric Marti. 2011. Image-based Cardiac Phase Retrieval in Intravascular Ultrasound Sequences. T-UFFC, 58(1), 60–72.
Abstract: Longitudinal motion during in vivo pullbacks acquisition of intravascular ultrasound (IVUS) sequences is a major artifact for 3-D exploring of coronary arteries. Most current techniques are based on the electrocardiogram (ECG) signal to obtain a gated pullback without longitudinal motion by using specific hardware or the ECG signal itself. We present an image-based approach for cardiac phase retrieval from coronary IVUS sequences without an ECG signal. A signal reflecting cardiac motion is computed by exploring the image intensity local mean evolution. The signal is filtered by a band-pass filter centered at the main cardiac frequency. Phase is retrieved by computing signal extrema. The average frame processing time using our setup is 36 ms. Comparison to manually sampled sequences encourages a deeper study comparing them to ECG signals.
Keywords: 3-D exploring; ECG; band-pass filter; cardiac motion; cardiac phase retrieval; coronary arteries; electrocardiogram signal; image intensity local mean evolution; image-based cardiac phase retrieval; in vivo pullbacks acquisition; intravascular ultrasound sequences; longitudinal motion; signal extrema; time 36 ms; band-pass filters; biomedical ultrasonics; cardiovascular system; electrocardiography; image motion analysis; image retrieval; image sequences; medical image processing; ultrasonic imaging
|
|
|
Javier Marin, David Vazquez, Antonio Lopez, Jaume Amores and Ludmila I. Kuncheva. 2014. Occlusion handling via random subspace classifiers for human detection. TSMCB, 44(3), 342–354.
Abstract: This paper describes a general method to address partial occlusions for human detection in still images. The Random Subspace Method (RSM) is chosen for building a classifier ensemble robust against partial occlusions. The component classifiers are chosen on the basis of their individual and combined performance. The main contribution of this work lies in our approach’s capability to improve the detection rate when partial occlusions are present without compromising the detection performance on non occluded data. In contrast to many recent approaches, we propose a method which does not require manual labelling of body parts, defining any semantic spatial components, or using additional data coming from motion or stereo. Moreover, the method can be easily extended to other object classes. The experiments are performed on three large datasets: the INRIA person dataset, the Daimler Multicue dataset, and a new challenging dataset, called PobleSec, in which a considerable number of targets are partially occluded. The different approaches are evaluated at the classification and detection levels for both partially occluded and non-occluded data. The experimental results show that our detector outperforms state-of-the-art approaches in the presence of partial occlusions, while offering performance and reliability similar to those of the holistic approach on non-occluded data. The datasets used in our experiments have been made publicly available for benchmarking purposes
Keywords: Pedestriand Detection; occlusion handling
|
|
|
Jaume Amores, N. Sebe and Petia Radeva. 2007. Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms.
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez and Antonio Lopez. 2014. Domain Adaptation of Deformable Part-Based Models. TPAMI, 36(12), 2367–2380.
Abstract: The accuracy of object classifiers can significantly drop when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, adapting the classifiers to the scenario in which they must operate is of paramount importance. We present novel domain adaptation (DA) methods for object detection. As proof of concept, we focus on adapting the state-of-the-art deformable part-based model (DPM) for pedestrian detection. We introduce an adaptive structural SVM (A-SSVM) that adapts a pre-learned classifier between different domains. By taking into account the inherent structure in feature space (e.g., the parts in a DPM), we propose a structure-aware A-SSVM (SA-SSVM). Neither A-SSVM nor SA-SSVM needs to revisit the source-domain training data to perform the adaptation. Rather, a low number of target-domain training examples (e.g., pedestrians) are used. To address the scenario where there are no target-domain annotated samples, we propose a self-adaptive DPM based on a self-paced learning (SPL) strategy and a Gaussian Process Regression (GPR). Two types of adaptation tasks are assessed: from both synthetic pedestrians and general persons (PASCAL VOC) to pedestrians imaged from an on-board camera. Results show that our proposals avoid accuracy drops as high as 15 points when comparing adapted and non-adapted detectors.
Keywords: Domain Adaptation; Pedestrian Detection
|
|