|
J.Poujol, Cristhian A. Aguilera-Carrasco, E.Danos, Boris X. Vintimilla, Ricardo Toledo and Angel Sappa. 2015. Visible-Thermal Fusion based Monocular Visual Odometry. 2nd Iberian Robotics Conference ROBOT2015. Springer International Publishing, 517–528.
Abstract: The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained
representations are evaluated under a visual odometry framework, highlighting
their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
Keywords: Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.
|
|
|
David Vazquez, Antonio Lopez, Daniel Ponsa and Javier Marin. 2011. Virtual Worlds and Active Learning for Human Detection. 13th International Conference on Multimodal Interaction. New York, NY, USA, USA, ACM DL, 393–400.
Abstract: Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e., trained with labelled samples. However, labeling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e., using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples (Fig. 1), which has not been done before. We present quantitative results showing that this approach is valid.
Keywords: Pedestrian Detection; Human detection; Virtual; Domain Adaptation; Active Learning
|
|
|
Jose Carlos Rubio, Joan Serrat and Antonio Lopez. 2012. Video Co-segmentation. 11th Asian Conference on Computer Vision. Springer Berlin Heidelberg, 13–24. (LNCS.)
Abstract: Segmentation of a single image is in general a highly underconstrained problem. A frequent approach to solve it is to somehow provide prior knowledge or constraints on how the objects of interest look like (in terms of their shape, size, color, location or structure). Image co-segmentation trades the need for such knowledge for something much easier to obtain, namely, additional images showing the object from other viewpoints. Now the segmentation problem is posed as one of differentiating the similar object regions in all the images from the more varying background. In this paper, for the first time, we extend this approach to video segmentation: given two or more video sequences showing the same object (or objects belonging to the same class) moving in a similar manner, we aim to outline its region in all the frames. In addition, the method works in an unsupervised manner, by learning to segment at testing time. We compare favorably with two state-of-the-art methods on video segmentation and report results on benchmark videos.
|
|
|
Daniel Ponsa and Antonio Lopez. 2007. Vehicle Trajectory Estimation based on Monocular Vision. 3rd Iberian Conference on Pattern Recognition and Image Analysis, LNCS 4477.587–594.
Keywords: vehicle detection
|
|
|
Ferran Diego, Daniel Ponsa, Joan Serrat and Antonio Lopez. 2010. Vehicle geolocalization based on video synchronization. 13th Annual International Conference on Intelligent Transportation Systems.1511–1516.
Abstract: TC8.6
This paper proposes a novel method for estimating the geospatial localization of a vehicle. I uses as input a georeferenced video sequence recorded by a forward-facing camera attached to the windscreen. The core of the proposed method is an on-line video synchronization which finds out the corresponding frame in the georeferenced video sequence to the one recorded at each time by the camera on a second drive through the same track. Once found the corresponding frame in the georeferenced video sequence, we transfer its geospatial information of this frame. The key advantages of this method are: 1) the increase of the update rate and the geospatial accuracy with regard to a standard low-cost GPS and 2) the ability to localize a vehicle even when a GPS is not available or is not reliable enough, like in certain urban areas. Experimental results for an urban environments are presented, showing an average of relative accuracy of 1.5 meters.
Keywords: video alignment
|
|
|
Alex Goldhoorn, Arnau Ramisa, Ramon Lopez de Mantaras and Ricardo Toledo. 2007. Using the Average Landmark Vector Method for Robot Homing. Artificial Intelligence Research and Development, Proceedings of the 10th International Conference of the ACIA.331–338.
|
|
|
Miguel Oliveira, Angel Sappa and V.Santos. 2011. Unsupervised Local Color Correction for Coarsely Registered Images. IEEE conference on Computer Vision and Pattern Recognition.201–208.
Abstract: The current paper proposes a new parametric local color correction technique. Initially, several color transfer functions are computed from the output of the mean shift color segmentation algorithm. Secondly, color influence maps are calculated. Finally, the contribution of every color transfer function is merged using the weights from the color influence maps. The proposed approach is compared with both global and local color correction approaches. Results show that our method outperforms the technique ranked first in a recent performance evaluation on this topic. Moreover, the proposed approach is computed in about one tenth of the time.
|
|
|
David Vazquez, Antonio Lopez and Daniel Ponsa. 2012. Unsupervised Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection. 21st International Conference on Pattern Recognition. Tsukuba Science City, JAPAN, IEEE, 3492–3495.
Abstract: Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate, and lead it to self-learn how to distinguish the objects of interest, i.e., without human intervention. However, the learning of each object model requires labelled samples collected through a tiresome manual process. For instance, we are interested in exploring the self-training of a pedestrian detector for driver assistance systems. Our first approach to avoid manual labelling consisted in the use of samples coming from realistic computer graphics, so that their labels are automatically available [12]. This would make possible the desired self-training of our pedestrian detector. However, as we showed in [14], between virtual and real worlds it may be a dataset shift. In order to overcome it, we propose the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process. In particular, this paper explores the use of the transductive SVM (T-SVM) learning algorithm in order to adapt virtual and real worlds for pedestrian detection (Fig. 1).
Keywords: Pedestrian Detection; Domain Adaptation; Virtual worlds
|
|
|
Jose Carlos Rubio, Joan Serrat and Antonio Lopez. 2012. Unsupervised co-segmentation through region matching. 25th IEEE Conference on Computer Vision and Pattern Recognition. IEEE Xplore, 749–756.
Abstract: Co-segmentation is defined as jointly partitioning multiple images depicting the same or similar object, into foreground and background. Our method consists of a multiple-scale multiple-image generative model, which jointly estimates the foreground and background appearance distributions from several images, in a non-supervised manner. In contrast to other co-segmentation methods, our approach does not require the images to have similar foregrounds and different backgrounds to function properly. Region matching is applied to exploit inter-image information by establishing correspondences between the common objects that appear in the scene. Moreover, computing many-to-many associations of regions allow further applications, like recognition of object parts across images. We report results on iCoseg, a challenging dataset that presents extreme variability in camera viewpoint, illumination and object deformations and poses. We also show that our method is robust against large intra-class variability in the MSRC database.
|
|
|
Judit Martinez, Eva Costa, P. Herreros, Antonio Lopez and Juan J. Villanueva. 2003. TV-Screen Quality Inspection by Artificial Vision. Proceedings SPIE 5132, Sixth International Conference on Quality Control by Artificial Vision (QCAV 2003).
Abstract: A real-time vision system for TV screen quality inspection is introduced. The whole system consists of eight cameras and one processor per camera. It acquires and processes 112 images in 6 seconds. The defects to be inspected can be grouped into four main categories (bubble, line-out, line reduction and landing) although there exists a large variability among each particular type of defect. The complexity of the whole inspection process has been reduced by dividing images into smaller ones and grouping the defects into frequency and intensity relevant ones. Tools such as mathematical morphology, Fourier transform, profile analysis and classification have been used. The performance of the system has been successfully proved against human operators in normal production conditions.
|
|