%0 Conference Proceedings
%T Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning
%A Yi Xiao
%A Felipe Codevilla
%A Diego Porres
%A Antonio Lopez
%B International Conference on Intelligent Robots and Systems
%D 2023
%F Yi Xiao2023
%O ADAS
%O exported from refbase (http://refbase.cvc.uab.es/show.php?record=3930), last updated on Thu, 25 Jan 2024 15:17:02 +0100
%X On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
%U https://ieeexplore.ieee.org/document/10341506