|
Carme Julia, Angel Sappa and Felipe Lumbreras. 2008. Aprendiendo a recrear la realidad en 3D.
|
|
|
Arnau Ramisa, Alex Goldhoorn, David Aldavert, Ricardo Toledo and Ramon Lopez de Mantaras. 2011. Combining Invariant Features and the ALV Homing Method for Autonomous Robot Navigation Based on Panoramas. JIRC, 64(3-4), 625–649.
Abstract: Biologically inspired homing methods, such as the Average Landmark Vector, are an interesting solution for local navigation due to its simplicity. However, usually they require a modification of the environment by placing artificial landmarks in order to work reliably. In this paper we combine the Average Landmark Vector with invariant feature points automatically detected in panoramic images to overcome this limitation. The proposed approach has been evaluated first in simulation and, as promising results are found, also in two data sets of panoramas from real world environments.
|
|
|
Fei Yang, Luis Herranz, Joost Van de Weijer, Jose Antonio Iglesias, Antonio Lopez and Mikhail Mozerov. 2020. Variable Rate Deep Image Compression with Modulated Autoencoder. SPL, 27, 331–335.
Abstract: Variable rate is a requirement for flexible and adaptable image and video compression. However, deep image compression methods (DIC) are optimized for a single fixed rate-distortion (R-D) tradeoff. While this can be addressed by training multiple models for different tradeoffs, the memory requirements increase proportionally to the number of models. Scaling the bottleneck representation of a shared autoencoder can provide variable rate compression with a single shared autoencoder. However, the R-D performance using this simple mechanism degrades in low bitrates, and also shrinks the effective range of bitrates. To address these limitations, we formulate the problem of variable R-D optimization for DIC, and propose modulated autoencoders (MAEs), where the representations of a shared autoencoder are adapted to the specific R-D tradeoff via a modulation network. Jointly training this modulated autoencoder and the modulation network provides an effective way to navigate the R-D operational curve. Our experiments show that the proposed method can achieve almost the same R-D performance of independent models with significantly fewer parameters.
|
|
|
Sudeep Katakol, Basem Elbarashy, Luis Herranz, Joost Van de Weijer and Antonio Lopez. 2021. Distributed Learning and Inference with Compressed Images. TIP, 30, 3069–3083.
Abstract: Modern computer vision requires processing large amounts of data, both while training the model and/or during inference, once the model is deployed. Scenarios where images are captured and processed in physically separated locations are increasingly common (e.g. autonomous vehicles, cloud computing). In addition, many devices suffer from limited resources to store or transmit data (e.g. storage space, channel capacity). In these scenarios, lossy image compression plays a crucial role to effectively increase the number of images collected under such constraints. However, lossy compression entails some undesired degradation of the data that may harm the performance of the downstream analysis task at hand, since important semantic information may be lost in the process. Moreover, we may only have compressed images at training time but are able to use original images at inference time, or vice versa, and in such a case, the downstream model suffers from covariate shift. In this paper, we analyze this phenomenon, with a special focus on vision-based perception for autonomous driving as a paradigmatic scenario. We see that loss of semantic information and covariate shift do indeed exist, resulting in a drop in performance that depends on the compression rate. In order to address the problem, we propose dataset restoration, based on image restoration with generative adversarial networks (GANs). Our method is agnostic to both the particular image compression method and the downstream task; and has the advantage of not adding additional cost to the deployed models, which is particularly important in resource-limited devices. The presented experiments focus on semantic segmentation as a challenging use case, cover a broad range of compression rates and diverse datasets, and show how our method is able to significantly alleviate the negative effects of compression on the downstream visual task.
|
|
|
Gabriel Villalonga, Joost Van de Weijer and Antonio Lopez. 2020. Recognizing new classes with synthetic data in the loop: application to traffic sign recognition. SENS, 20(3), 583.
Abstract: On-board vision systems may need to increase the number of classes that can be recognized in a relatively short period. For instance, a traffic sign recognition system may suddenly be required to recognize new signs. Since collecting and annotating samples of such new classes may need more time than we wish, especially for uncommon signs, we propose a method to generate these samples by combining synthetic images and Generative Adversarial Network (GAN) technology. In particular, the GAN is trained on synthetic and real-world samples from known classes to perform synthetic-to-real domain adaptation, but applied to synthetic samples of the new classes. Using the Tsinghua dataset with a synthetic counterpart, SYNTHIA-TS, we have run an extensive set of experiments. The results show that the proposed method is indeed effective, provided that we use a proper Convolutional Neural Network (CNN) to perform the traffic sign recognition (classification) task as well as a proper GAN to transform the synthetic images. Here, a ResNet101-based classifier and domain adaptation based on CycleGAN performed extremely well for a ratio∼ 1/4 for new/known classes; even for more challenging ratios such as∼ 4/1, the results are also very positive.
|
|
|
Fahad Shahbaz Khan, Muhammad Anwer Rao, Joost Van de Weijer, Michael Felsberg and J.Laaksonen. 2015. Compact color texture description for texture classification. PRL, 51, 16–22.
Abstract: Describing textures is a challenging problem in computer vision and pattern recognition. The classification problem involves assigning a category label to the texture class it belongs to. Several factors such as variations in scale, illumination and viewpoint make the problem of texture description extremely challenging. A variety of histogram based texture representations exists in literature.
However, combining multiple texture descriptors and assessing their complementarity is still an open research problem. In this paper, we first show that combining multiple local texture descriptors significantly improves the recognition performance compared to using a single best method alone. This
gain in performance is achieved at the cost of high-dimensional final image representation. To counter this problem, we propose to use an information-theoretic compression technique to obtain a compact texture description without any significant loss in accuracy. In addition, we perform a comprehensive
evaluation of pure color descriptors, popular in object recognition, for the problem of texture classification. Experiments are performed on four challenging texture datasets namely, KTH-TIPS-2a, KTH-TIPS-2b, FMD and Texture-10. The experiments clearly demonstrate that our proposed compact multi-texture approach outperforms the single best texture method alone. In all cases, discriminative color names outperforms other color features for texture classification. Finally, we show that combining discriminative color names with compact texture representation outperforms state-of-the-art methods by 7:8%, 4:3% and 5:0% on KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets respectively.
|
|
|
Xavier Boix, Josep M. Gonfaus, Joost Van de Weijer, Andrew Bagdanov, Joan Serrat and Jordi Gonzalez. 2012. Harmony Potentials: Fusing Global and Local Scale for Semantic Image Segmentation. IJCV, 96(1), 83–102.
Abstract: The Hierarchical Conditional Random Field(HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales.
At higher scales in the image, this representation yields an oversimplied model since multiple classes can be reasonably expected to appear within large regions. This simplied model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To
address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combi-
nation of labels, penalizing only unlikely combinations of classes. We also propose an eective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.
|
|
|
Enric Marti, Carme Julia and Debora Gil. 2006. A PBL Experience in the Teaching of Computer Graphics. CGF, 25(1), 95–103.
Abstract: Project-Based Learning (PBL) is an educational strategy to improve student’s learning capability that, in recent years, has had a progressive acceptance in undergraduate studies. This methodology is based on solving a problem or project in a student working group. In this way, PBL focuses on learning the necessary tools to correctly find a solution to given problems. Since the learning initiative is transferred to the student, the PBL method promotes students own abilities. This allows a better assessment of the true workload that carries out the student in the subject. It follows that the methodology conforms to the guidelines of the Bologna document, which quantifies the student workload in a subject by means of the European credit transfer system (ECTS). PBL is currently applied in undergraduate studies needing strong practical training such as medicine, nursing or law sciences. Although this is also the case in engineering studies, amazingly, few experiences have been reported. In this paper we propose to use PBL in the educational organization of the Computer Graphics subjects in the Computer Science degree. Our PBL project focuses in the development of a C++ graphical environment based on the OpenGL libraries for visualization and handling of different graphical objects. The starting point is a basic skeleton that already includes lighting functions, perspective projection with mouse interaction to change the point of view and three predefined objects. Students have to complete this skeleton by adding their own functions to solve the project. A total number of 10 projects have been proposed and successfully solved. The exercises range from human face rendering to articulated objects, such as robot arms or puppets. In the present paper we extensively report the statement and educational objectives for two of the projects: solar system visualization and a chess game. We report our earlier educational experience based on the standard classroom theoretical, problem and practice sessions and the reasons that motivated searching for other learning methods. We have mainly chosen PBL because it improves the student learning initiative. We have applied the PBL educational model since the beginning of the second semester. The student’s feedback increases in his interest for the subject. We present a comparative study of the teachers’ and students’ workload between PBL and the classic teaching approach, which suggests that the workload increase in PBL is not as high as it seems.
|
|
|
Sergio Vera, Debora Gil, Antonio Lopez and Miguel Angel Gonzalez Ballester. 2012. Multilocal Creaseness Measure.
Abstract: This document describes the implementation using the Insight Toolkit of an algorithm for detecting creases (ridges and valleys) in N-dimensional images, based on the Local Structure Tensor of the image. In addition to the filter used to calculate the creaseness image, a filter for the computation of the structure tensor is also included in this submission.
Keywords: Ridges, Valley, Creaseness, Structure Tensor, Skeleton,
|
|
|
Aura Hernandez-Sabate, Debora Gil, Jaume Garcia and Enric Marti. 2011. Image-based Cardiac Phase Retrieval in Intravascular Ultrasound Sequences. T-UFFC, 58(1), 60–72.
Abstract: Longitudinal motion during in vivo pullbacks acquisition of intravascular ultrasound (IVUS) sequences is a major artifact for 3-D exploring of coronary arteries. Most current techniques are based on the electrocardiogram (ECG) signal to obtain a gated pullback without longitudinal motion by using specific hardware or the ECG signal itself. We present an image-based approach for cardiac phase retrieval from coronary IVUS sequences without an ECG signal. A signal reflecting cardiac motion is computed by exploring the image intensity local mean evolution. The signal is filtered by a band-pass filter centered at the main cardiac frequency. Phase is retrieved by computing signal extrema. The average frame processing time using our setup is 36 ms. Comparison to manually sampled sequences encourages a deeper study comparing them to ECG signals.
Keywords: 3-D exploring; ECG; band-pass filter; cardiac motion; cardiac phase retrieval; coronary arteries; electrocardiogram signal; image intensity local mean evolution; image-based cardiac phase retrieval; in vivo pullbacks acquisition; intravascular ultrasound sequences; longitudinal motion; signal extrema; time 36 ms; band-pass filters; biomedical ultrasonics; cardiovascular system; electrocardiography; image motion analysis; image retrieval; image sequences; medical image processing; ultrasonic imaging
|
|