|
Fei Yang, Luis Herranz, Joost Van de Weijer, Jose Antonio Iglesias, Antonio Lopez and Mikhail Mozerov. 2020. Variable Rate Deep Image Compression with Modulated Autoencoder. SPL, 27, 331–335.
Abstract: Variable rate is a requirement for flexible and adaptable image and video compression. However, deep image compression methods (DIC) are optimized for a single fixed rate-distortion (R-D) tradeoff. While this can be addressed by training multiple models for different tradeoffs, the memory requirements increase proportionally to the number of models. Scaling the bottleneck representation of a shared autoencoder can provide variable rate compression with a single shared autoencoder. However, the R-D performance using this simple mechanism degrades in low bitrates, and also shrinks the effective range of bitrates. To address these limitations, we formulate the problem of variable R-D optimization for DIC, and propose modulated autoencoders (MAEs), where the representations of a shared autoencoder are adapted to the specific R-D tradeoff via a modulation network. Jointly training this modulated autoencoder and the modulation network provides an effective way to navigate the R-D operational curve. Our experiments show that the proposed method can achieve almost the same R-D performance of independent models with significantly fewer parameters.
|
|
|
Ferran Diego, Joan Serrat and Antonio Lopez. 2013. Joint spatio-temporal alignment of sequences. TMM, 15(6), 1377–1387.
Abstract: Video alignment is important in different areas of computer vision such as wide baseline matching, action recognition, change detection, video copy detection and frame dropping prevention. Current video alignment methods usually deal with a relatively simple case of fixed or rigidly attached cameras or simultaneous acquisition. Therefore, in this paper we propose a joint video alignment for bringing two video sequences into a spatio-temporal alignment. Specifically, the novelty of the paper is to formulate the video alignment to fold the spatial and temporal alignment into a single alignment framework. This simultaneously satisfies a frame-correspondence and frame-alignment similarity; exploiting the knowledge among neighbor frames by a standard pairwise Markov random field (MRF). This new formulation is able to handle the alignment of sequences recorded at different times by independent moving cameras that follows a similar trajectory, and also generalizes the particular cases that of fixed geometric transformation and/or linear temporal mapping. We conduct experiments on different scenarios such as sequences recorded simultaneously or by moving cameras to validate the robustness of the proposed approach. The proposed method provides the highest video alignment accuracy compared to the state-of-the-art methods on sequences recorded from vehicles driving along the same track at different times.
Keywords: video alignment
|
|
|
Jose Manuel Alvarez, Theo Gevers, Ferran Diego and Antonio Lopez. 2013. Road Geometry Classification by Adaptative Shape Models. TITS, 14(1), 459–468.
Abstract: Vision-based road detection is important for different applications in transportation, such as autonomous driving, vehicle collision warning, and pedestrian crossing detection. Common approaches to road detection are based on low-level road appearance (e.g., color or texture) and neglect of the scene geometry and context. Hence, using only low-level features makes these algorithms highly depend on structured roads, road homogeneity, and lighting conditions. Therefore, the aim of this paper is to classify road geometries for road detection through the analysis of scene composition and temporal coherence. Road geometry classification is proposed by building corresponding models from training images containing prototypical road geometries. We propose adaptive shape models where spatial pyramids are steered by the inherent spatial structure of road images. To reduce the influence of lighting variations, invariant features are used. Large-scale experiments show that the proposed road geometry classifier yields a high recognition rate of 73.57% ± 13.1, clearly outperforming other state-of-the-art methods. Including road shape information improves road detection results over existing appearance-based methods. Finally, it is shown that invariant features and temporal information provide robustness against disturbing imaging conditions.
Keywords: road detection
|
|
|
Joan Serrat, Felipe Lumbreras and Antonio Lopez. 2013. Cost estimation of custom hoses from STL files and CAD drawings. COMPUTIND, 64(3), 299–309.
Abstract: We present a method for the cost estimation of custom hoses from CAD models. They can come in two formats, which are easy to generate: a STL file or the image of a CAD drawing showing several orthogonal projections. The challenges in either cases are, first, to obtain from them a high level 3D description of the shape, and second, to learn a regression function for the prediction of the manufacturing time, based on geometric features of the reconstructed shape. The chosen description is the 3D line along the medial axis of the tube and the diameter of the circular sections along it. In order to extract it from STL files, we have adapted RANSAC, a robust parametric fitting algorithm. As for CAD drawing images, we propose a new technique for 3D reconstruction from data entered on any number of orthogonal projections. The regression function is a Gaussian process, which does not constrain the function to adopt any specific form and is governed by just two parameters. We assess the accuracy of the manufacturing time estimation by k-fold cross validation on 171 STL file models for which the time is provided by an expert. The results show the feasibility of the method, whereby the relative error for 80% of the testing samples is below 15%.
Keywords: On-line quotation; STL format; Regression; Gaussian process
|
|
|
Miguel Oliveira, Victor Santos, Angel Sappa, P. Dias and A. Moreira. 2016. Incremental Scenario Representations for Autonomous Driving using Geometric Polygonal Primitives. RAS, 83, 312–325.
Abstract: When an autonomous vehicle is traveling through some scenario it receives a continuous stream of sensor data. This sensor data arrives in an asynchronous fashion and often contains overlapping or redundant information. Thus, it is not trivial how a representation of the environment observed by the vehicle can be created and updated over time. This paper presents a novel methodology to compute an incremental 3D representation of a scenario from 3D range measurements. We propose to use macro scale polygonal primitives to model the scenario. This means that the representation of the scene is given as a list of large scale polygons that describe the geometric structure of the environment. Furthermore, we propose mechanisms designed to update the geometric polygonal primitives over time whenever fresh sensor data is collected. Results show that the approach is capable of producing accurate descriptions of the scene, and that it is computationally very efficient when compared to other reconstruction techniques.
Keywords: Incremental scene reconstruction; Point clouds; Autonomous vehicles; Polygonal primitives
|
|
|
Jaume Amores. 2015. MILDE: multiple instance learning by discriminative embedding. KAIS, 42(2), 381–407.
Abstract: While the objective of the standard supervised learning problem is to classify feature vectors, in the multiple instance learning problem, the objective is to classify bags, where each bag contains multiple feature vectors. This represents a generalization of the standard problem, and this generalization becomes necessary in many real applications such as drug activity prediction, content-based image retrieval, and others. While the existing paradigms are based on learning the discriminant information either at the instance level or at the bag level, we propose to incorporate both levels of information. This is done by defining a discriminative embedding of the original space based on the responses of cluster-adapted instance classifiers. Results clearly show the advantage of the proposed method over the state of the art, where we tested the performance through a variety of well-known databases that come from real problems, and we also included an analysis of the performance using synthetically generated data.
Keywords: Multi-instance learning; Codebook; Bag of words
|
|
|
Katerine Diaz, Francesc J. Ferri and Aura Hernandez-Sabate. 2018. An overview of incremental feature extraction methods based on linear subspaces. KBS, 145, 219–235.
Abstract: With the massive explosion of machine learning in our day-to-day life, incremental and adaptive learning has become a major topic, crucial to keep up-to-date and improve classification models and their corresponding feature extraction processes. This paper presents a categorized overview of incremental feature extraction based on linear subspace methods which aim at incorporating new information to the already acquired knowledge without accessing previous data. Specifically, this paper focuses on those linear dimensionality reduction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives. Thus, we cover the approaches derived from Principal Components Analysis, Linear Discriminative Analysis and Discriminative Common Vector methods. For each basic method, its incremental approaches are differentiated according to the subspace model and matrix decomposition involved in the updating process. Besides this categorization, several updating strategies are distinguished according to the amount of data used to update and to the fact of considering a static or dynamic number of classes. Moreover, the specific role of the size/dimension ratio in each method is considered. Finally, computational complexity, experimental setup and the accuracy rates according to published results are compiled and analyzed, and an empirical evaluation is done to compare the best approach of each kind.
|
|