|
Angel Sappa. (2006). Splitting up Panoramic Range Images into Compact 2½D Representations. International Journal of Imaging Systems and Technology, 16(3): 85–91.
|
|
|
Alejandro Gonzalez Alzate, Zhijie Fang, Yainuvis Socarras, Joan Serrat, David Vazquez, Jiaolong Xu, et al. (2016). Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison. SENS - Sensors, 16(6), 820.
Abstract: Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and night time. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images, (b) just infrared images and (c) both of them. In order to obtain results for the last item we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset we have built for this purpose as well as on the publicly available KAIST multispectral dataset.
Keywords: Pedestrian Detection; FIR
|
|
|
Alejandro Gonzalez Alzate, David Vazquez, Antonio Lopez, & Jaume Amores. (2017). On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts. Cyber - IEEE Transactions on cybernetics, 47(11), 3980–3990.
Abstract: Despite recent significant advances, object detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities, and a strong multiview (MV) classifier that accounts for different object views and poses. In this paper, we provide an extensive evaluation that gives insight into how each of these aspects (multicue, multimodality, and strong MV classifier) affect accuracy both individually and when integrated together. In the multimodality component, we explore the fusion of RGB and depth maps obtained by high-definition light detection and ranging, a type of modality that is starting to receive increasing attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the accuracy, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient.
Keywords: Multicue; multimodal; multiview; object detection
|
|
|
Akhil Gurram, Onay Urfalioglu, Ibrahim Halfaoui, Fahd Bouzaraa, & Antonio Lopez. (2020). Semantic Monocular Depth Estimation Based on Artificial Intelligence. ITSM - IEEE Intelligent Transportation Systems Magazine, 13(4), 99–103.
Abstract: Depth estimation provides essential information to perform autonomous driving and driver assistance. A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels where the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on monocular depth estimation.
|
|
|
Akhil Gurram, Ahmet Faruk Tuna, Fengyi Shen, Onay Urfalioglu, & Antonio Lopez. (2021). Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision. TITS - IEEE Transactions on Intelligent Transportation Systems, 23(8), 12738–12751.
Abstract: Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
|
|