Jorge Charco, Angel Sappa, Boris X. Vintimilla, & Henry Velesaca. (2020). Transfer Learning from Synthetic Data in the Camera Pose Estimation Problem. In 15th International Conference on Computer Vision Theory and Applications.
Abstract: This paper presents a novel Siamese network architecture, as a variant of Resnet-50, to estimate the relative camera pose on multi-view environments. In order to improve the performance of the proposed model a transfer learning strategy, based on synthetic images obtained from a virtual-world, is considered. The transfer learning consists of first training the network using pairs of images from the virtual-world scenario
considering different conditions (i.e., weather, illumination, objects, buildings, etc.); then, the learned weight
of the network are transferred to the real case, where images from real-world scenarios are considered. Experimental results and comparisons with the state of the art show both, improvements on the relative pose estimation accuracy using the proposed model, as well as further improvements when the transfer learning strategy (synthetic-world data transfer learning real-world data) is considered to tackle the limitation on the
training due to the reduced number of pairs of real-images on most of the public data sets.
|
Carola Figueroa Flores, Bogdan Raducanu, David Berga, & Joost Van de Weijer. (2021). Hallucinating Saliency Maps for Fine-Grained Image Classification for Limited Data Domains. In 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 4, pp. 163–171).
Abstract: arXiv:2007.12562
Most of the saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline, like for instance, image classification. In the current paper, we propose an approach which does not require explicit saliency maps to improve image classification, but they are learned implicitely, during the training of an end-to-end image classification task. We show that our approach obtains similar results as the case when the saliency maps are provided explicitely. Combining RGB data with saliency maps represents a significant advantage for object recognition, especially for the case when training data is limited. We validate our method on several datasets for fine-grained classification tasks (Flowers, Birds and Cars). In addition, we show that our saliency estimation method, which is trained without any saliency groundtruth data, obtains competitive results on real image saliency benchmark (Toronto), and outperforms deep saliency models with synthetic images (SID4VAM).
|
Jorge Charco, Angel Sappa, & Boris X. Vintimilla. (2022). Human Pose Estimation through a Novel Multi-view Scheme. In 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) (Vol. 5, pp. 855–862).
Abstract: This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a
multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and
comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
Keywords: Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach
|
Rafael E. Rivadeneira, Angel Sappa, & Boris X. Vintimilla. (2022). Multi-Image Super-Resolution for Thermal Images. In 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) (Vol. 4, pp. 635–642).
Abstract: This paper proposes a novel CNN architecture for the multi-thermal image super-resolution problem. In the proposed scheme, the multi-images are synthetically generated by downsampling and slightly shifting the given image; noise is also added to each of these synthesized images. The proposed architecture uses two
attention blocks paths to extract high-frequency details taking advantage of the large information extracted from multiple images of the same scene. Experimental results are provided, showing the proposed scheme has overcome the state-of-the-art approaches.
Keywords: Thermal Images; Multi-view; Multi-frame; Super-Resolution; Deep Learning; Attention Block
|
Patricia Suarez, & Angel Sappa. (2024). A Generative Model for Guided Thermal Image Super-Resolution. In 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
Abstract: This paper presents a novel approach for thermal super-resolution based on a fusion prior, low-resolution thermal image and H brightness channel of the corresponding visible spectrum image. The method combines bicubic interpolation of the ×8 scale target image with the brightness component. To enhance the guidance process, the original RGB image is converted to HSV, and the brightness channel is extracted. Bicubic interpolation is then applied to the low-resolution thermal image, resulting in a Bicubic-Brightness channel blend. This luminance-bicubic fusion is used as an input image to help the training process. With this fused image, the cyclic adversarial generative network obtains high-resolution thermal image results. Experimental evaluations show that the proposed approach significantly improves spatial resolution and pixel intensity levels compared to other state-of-the-art techniques, making it a promising method to obtain high-resolution thermal.
|
Hector Laria Mantecon, Kai Wang, Joost Van de Weijer, Bogdan Raducanu, & Kai Wang. (2024). NeRF-Diffusion for 3D-Consistent Face Generation and Editing. In 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
Abstract: Generating high-fidelity 3D-aware images without 3D supervision is a valuable capability in various applications. Current methods based on NeRF features, SDF information, or triplane features have limited variation after training. To address this, we propose a novel approach that combines pretrained models for shape and content generation. Our method leverages a pretrained Neural Radiance Field as a shape prior and a diffusion model for content generation. By conditioning the diffusion model with 3D features, we enhance its ability to generate novel views with 3D awareness. We introduce a consistency token shared between the NeRF module and the diffusion model to maintain 3D consistency during sampling. Moreover, our framework allows for text editing of 3D-aware image generation, enabling users to modify the style over 3D views while preserving semantic content. Our contributions include incorporating 3D awareness into a text-to-image model, addressing identity consistency in 3D view synthesis, and enabling text editing of 3D-aware image generation. We provide detailed explanations, including the shape prior based on the NeRF model and the content generation process using the diffusion model. We also discuss challenges such as shape consistency and sampling saturation. Experimental results demonstrate the effectiveness and visual quality of our approach.
|
Bhalaji Nagarajan, Ricardo Marques, Marcos Mejia, & Petia Radeva. (2022). Class-conditional Importance Weighting for Deep Learning with Noisy Labels. In 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 679–686).
Abstract: Large-scale accurate labels are very important to the Deep Neural Networks to train them and assure high performance. However, it is very expensive to create a clean dataset since usually it relies on human interaction. To this purpose, the labelling process is made cheap with a trade-off of having noisy labels. Learning with Noisy Labels is an active area of research being at the same time very challenging. The recent advances in Self-supervised learning and robust loss functions have helped in advancing noisy label research. In this paper, we propose a loss correction method that relies on dynamic weights computed based on the model training. We extend the existing Contrast to Divide algorithm coupled with DivideMix using a new class-conditional weighted scheme. We validate the method using the standard noise experiments and achieved encouraging results.
Keywords: Noisy Labeling; Loss Correction; Class-conditional Importance Weighting; Learning with Noisy Labels
|
Cesar Isaza, Joaquin Salas, & Bogdan Raducanu. (2012). Synthetic ground truth dataset to detect shadow cast by static objects in outdoor. In 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications (art. 11). ACM.
Abstract: In this paper, we propose a precise synthetic ground truth dataset to study the problem of detection of the shadows cast by static objects in outdoor environments during extended periods of time (days). For our dataset, we have created a virtual scenario using a rendering software. To increase the realism of the simulated environment, we have defined the scenario in a precise geographical location. In our dataset the sun is by far the main illumination source. The sun position during the simulation time takes into consideration factors related to the geographical location, such as the latitude, longitude, elevation above sea level, and precise image capturing day and time. In our simulation the camera remains fixed. The dataset consists of seven days of simulation, from 10:00am to 5:00pm. Images are captured every 10 seconds. The shadows' ground truth is automatically computed by the rendering software.
|
Enric Marti, Ferran Poveda, Antoni Gurgui, Jaume Rocarias, Debora Gil, & Aura Hernandez-Sabate. (2013). Una experiencia de estructura, funcionamiento y evaluación de la asignatura de graficos por computador con metodologia de aprendizaje basado en proyectos.
Abstract: IV Congreso Internacional UNIVEST
|
Enric Marti, Ferran Poveda, Antoni Gurgui, Jaume Rocarias, & Debora Gil. (2013). Una propuesta de seguimiento, tutorías on line y evaluación en la metodología de Aprendizaje Basado en Proyectos.
|
Carolina Malagelada, F.De Lorio, Fernando Azpiroz, Santiago Segui, Petia Radeva, Anna Accarino, et al. (2010). Intestinal Dysmotility in Patients with Functional Intestinal Disorders Demonstrated by Computer Vision Analysis of Capsule Endoscopy Images. In 18th United European Gastroenterology Week (Vol. 56, pp. A19–20).
|
Gloria Fernandez Esparrach, Jorge Bernal, Cristina Rodriguez de Miguel, Debora Gil, Fernando Vilariño, Henry Cordova, et al. (2015). Colonic polyps are correctly identified by a computer vision method using wm-dova energy maps. In Proceedings of 23 United European- UEG Week 2015.
|
Antonio Hernandez, Carlo Gatta, Laura Igual, Sergio Escalera, & Petia Radeva. (2011). Automatic Angiography Segmentation Based on Improved Graph-cut. In Jornada TIC Salut Girona.
|
Laura Igual, Antonio Hernandez, Sergio Escalera, Miguel Reyes, Josep Moya, Joan Carles Soliva, et al. (2011). Automatic Techniques for Studying Attention-Deficit/Hyperactivity Disorder. In Jornada TIC Salut Girona.
|
Pau Baiget, Eric Sommerlade, I. Reid, & Jordi Gonzalez. (2008). Finding Prototypes to Estimate Trajectory Development in Outdoor Scenarios. In First International Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences BMVC 2008, (27–34).
|