Publicacions CVC -- Query Results

Josep Llados. (2007). Advances in Graphics Recognition. In Digital Document Processing, Major Directions and Recent Advances, Advances in Pattern Recognition, B.B. Chaudhuri, ed., 281–304. http://refbase.cvc.uab.es/show.php?record=780
Niki Aifanti, Angel Sappa, N. Grammalidis, & Sotiris Malassiotis. (2009). Advances in Tracking and Recognition of Human Motion. In Encyclopedia of Information Science and Technology (Vol. I, 65–71). http://refbase.cvc.uab.es/show.php?record=1143
Angel Sappa, Niki Aifanti, N. Grammalidis, & Sotiris Malassiotis. (2004). Advances in Vision-Based Human Body Modeling. In N. Sarris and M. Strintzis. (Ed.), 3D Modeling & Animation: Systhesis and Analysis Techniques for the Human Body (pp. 1–26). http://refbase.cvc.uab.es/show.php?record=458
J.Kuhn, A.Nussbaumer, J.Pirker, Dimosthenis Karatzas, A. Pagani, O.Conlan, et al. (2015). Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. In Workshop Proceedings on the 11th International Conference on Intelligent Environments (Vol. 19, pp. 373–380). Abstract: Translating conceptual knowledge into real world experiences presents a significant educational challenge. This position paper presents an approach that supports learners in moving seamlessly between conceptual learning and their application in the real world by bringing physical and virtual experiments into everyday settings. Learners are empowered in conducting these situated experiments in a variety of physical settings by leveraging state of the art mobile, augmented reality, and virtual reality technology. A blend of mobile-based multi-sensory physical experiments, augmented reality and enabling virtual environments can allow learners to bridge their conceptual learning with tangible experiences in a completely novel manner. This approach focuses on the learner by applying self-regulated personalised learning techniques, underpinned by innovative pedagogical approaches and adaptation techniques, to ensure that the needs and preferences of each learner are catered for individually. http://refbase.cvc.uab.es/show.php?record=2694
Yi Xiao. (2023). Advancing Vision-based End-to-End Autonomous Driving (Antonio Lopez, Ed.). Ph.D. thesis, IMPRIMA, . Abstract: In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving. We propose to evaluate three approaches to tackle the challenge of end-to-end autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular depth estimation module, where human labeling is not needed. Then, based on the intuition that the latent space of end-to-end driving models encodes relevant information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning. CIL++ leverages modern best practices, such as a large horizontal field of view and a self-attention mechanism, which are contributing to the agent’s understanding of the driving scene and bringing a better imitation of human drivers. Using training data without any human labeling, our model yields almost expert performance in the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data. http://refbase.cvc.uab.es/show.php?record=3964
Aitor Alvarez-Gila, Joost Van de Weijer, & Estibaliz Garrote. (2017). Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB. In 1st International Workshop on Physics Based Vision meets Deep Learning. Abstract: Hyperspectral signal reconstruction aims at recovering the original spectral input that produced a certain trichromatic (RGB) response from a capturing device or observer. Given the heavily underconstrained, non-linear nature of the problem, traditional techniques leverage different statistical properties of the spectral signal in order to build informative priors from real world object reflectances for constructing such RGB to spectral signal mapping. However, most of them treat each sample independently, and thus do not benefit from the contextual information that the spatial dimensions can provide. We pose hyperspectral natural image reconstruction as an image to image mapping learning problem, and apply a conditional generative adversarial framework to help capture spatial semantics. This is the first time Convolutional Neural Networks -and, particularly, Generative Adversarial Networks- are used to solve this task. Quantitative evaluation shows a Root Mean Squared Error (RMSE) drop of 44:7% and a Relative RMSE drop of 47:0% on the ICVL natural hyperspectral image dataset. http://refbase.cvc.uab.es/show.php?record=2969
Maria Alberich-Carramiñana, Guillem Alenya, Juan Andrade, E. Martinez, & Carme Torras. (2006). Affine Epipolar Direction from Two Views of a Planar Contour. In Proceedings of the Advanced Concepts for Intelligent Vision Systems Conference, LNCS 4179: 944–955. http://refbase.cvc.uab.es/show.php?record=661
Pau Rodriguez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca, & Jordi Gonzalez. (2017). Age and gender recognition in the wild with deep attention. PR - Pattern Recognition, 72, 563–571. Abstract: Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy. Keywords: Age recognition; Gender recognition; Deep neural networks; Attention mechanisms http://refbase.cvc.uab.es/show.php?record=2962
Margarita Torre, & Petia Radeva. (2000). Agricultural-Field Extraction on Aerial Images by Region Competition Algorithm. In 15 th International Conference on Pattern Recognition (Vol. 1, pp. 313–316). http://refbase.cvc.uab.es/show.php?record=222
V. Kober, Mikhail Mozerov, Josue Albarez, & I.A. Ovseyevich. (2007). Algorithms for Impulse Noise Renoval from Corrupted Color Images. http://refbase.cvc.uab.es/show.php?record=811
Xavier Otazu, & J. Nuñez. (2001). Algoritmo de Clasificacion no Supervisada Basado en Wavelets.. http://refbase.cvc.uab.es/show.php?record=147
Michal Drozdzal, Laura Igual, Petia Radeva, Jordi Vitria, Carolina Malagelada, & Fernando Azpiroz. (2010). Aligning Endoluminal Scene Sequences in Wireless Capsule Endoscopy. In IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (117–124). Abstract: Intestinal motility analysis is an important examination in detection of various intestinal malfunctions. One of the big challenges of automatic motility analysis is how to compare sequence of images and extract dynamic paterns taking into account the high deformability of the intestine wall as well as the capsule motion. From clinical point of view the ability to align endoluminal scene sequences will help to find regions of similar intestinal activity and in this way will provide a valuable information on intestinal motility problems. This work, for first time, addresses the problem of aligning endoluminal sequences taking into account motion and structure of the intestine. To describe motility in the sequence, we propose different descriptors based on the Sift Flow algorithm, namely: (1) Histograms of Sift Flow Directions to describe the flow course, (2) Sift Descriptors to represent image intestine structure and (3) Sift Flow Magnitude to quantify intestine deformation. We show that the merge of all three descriptors provides robust information on sequence description in terms of motility. Moreover, we develop a novel methodology to rank the intestinal sequences based on the expert feedback about relevance of the results. The experimental results show that the selected descriptors are useful in the alignment and similarity description and the proposed method allows the analysis of the WCE. http://refbase.cvc.uab.es/show.php?record=1316
Sounak Dey, Anjan Dutta, Suman Ghosh, Ernest Valveny, & Josep Llados. (2018). Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework. In 14th Asian Conference on Computer Vision. Abstract: In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset. http://refbase.cvc.uab.es/show.php?record=3151
Ferran Diego. (2007). Alignment of Videos Recorded from Moving Vehicles. http://refbase.cvc.uab.es/show.php?record=825
Joan Serrat, Ferran Diego, Jose Manuel Alvarez, & Felipe Lumbreras. (2007). Alignment of Videos Recorded from Moving Vehicles. In in 14th International Conference on Image Analysis and Processing, (512–517). http://refbase.cvc.uab.es/show.php?record=879

Josep Llados. (2007). Advances in Graphics Recognition. In Digital Document Processing, Major Directions and Recent Advances, Advances in Pattern Recognition, B.B. Chaudhuri, ed., 281–304.

Niki Aifanti, Angel Sappa, N. Grammalidis, & Sotiris Malassiotis. (2009). Advances in Tracking and Recognition of Human Motion. In Encyclopedia of Information Science and Technology (Vol. I, 65–71).

Angel Sappa, Niki Aifanti, N. Grammalidis, & Sotiris Malassiotis. (2004). Advances in Vision-Based Human Body Modeling. In N. Sarris and M. Strintzis. (Ed.), 3D Modeling & Animation: Systhesis and Analysis Techniques for the Human Body (pp. 1–26).

J.Kuhn, A.Nussbaumer, J.Pirker, Dimosthenis Karatzas, A. Pagani, O.Conlan, et al. (2015). Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. In Workshop Proceedings on the 11th International Conference on Intelligent Environments (Vol. 19, pp. 373–380).

Yi Xiao. (2023). Advancing Vision-based End-to-End Autonomous Driving (Antonio Lopez, Ed.). Ph.D. thesis, IMPRIMA, .

Abstract: In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.
We propose to evaluate three approaches to tackle the challenge of end-to-end
autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s
ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular
depth estimation module, where human labeling is not needed. Then, based on
the intuition that the latent space of end-to-end driving models encodes relevant
information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning.
CIL++ leverages modern best practices, such as a large horizontal field of view and
a self-attention mechanism, which are contributing to the agent’s understanding of
the driving scene and bringing a better imitation of human drivers. Using training
data without any human labeling, our model yields almost expert performance in
the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.

http://refbase.cvc.uab.es/show.php?record=3964

Aitor Alvarez-Gila, Joost Van de Weijer, & Estibaliz Garrote. (2017). Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB. In 1st International Workshop on Physics Based Vision meets Deep Learning.

Maria Alberich-Carramiñana, Guillem Alenya, Juan Andrade, E. Martinez, & Carme Torras. (2006). Affine Epipolar Direction from Two Views of a Planar Contour. In Proceedings of the Advanced Concepts for Intelligent Vision Systems Conference, LNCS 4179: 944–955.

Pau Rodriguez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca, & Jordi Gonzalez. (2017). Age and gender recognition in the wild with deep attention. PR - Pattern Recognition, 72, 563–571.

Margarita Torre, & Petia Radeva. (2000). Agricultural-Field Extraction on Aerial Images by Region Competition Algorithm. In 15 th International Conference on Pattern Recognition (Vol. 1, pp. 313–316).

V. Kober, Mikhail Mozerov, Josue Albarez, & I.A. Ovseyevich. (2007). Algorithms for Impulse Noise Renoval from Corrupted Color Images.

Xavier Otazu, & J. Nuñez. (2001). Algoritmo de Clasificacion no Supervisada Basado en Wavelets..

Michal Drozdzal, Laura Igual, Petia Radeva, Jordi Vitria, Carolina Malagelada, & Fernando Azpiroz. (2010). Aligning Endoluminal Scene Sequences in Wireless Capsule Endoscopy. In IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (117–124).

Sounak Dey, Anjan Dutta, Suman Ghosh, Ernest Valveny, & Josep Llados. (2018). Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework. In 14th Asian Conference on Computer Vision.

Ferran Diego. (2007). Alignment of Videos Recorded from Moving Vehicles.

Joan Serrat, Ferran Diego, Jose Manuel Alvarez, & Felipe Lumbreras. (2007). Alignment of Videos Recorded from Moving Vehicles. In in 14th International Conference on Image Analysis and Processing, (512–517).