|
F. de la Torre, Jordi Vitria, Petia Radeva, & J. Melenchon. (2000). EigenFiltering for flexible Eigentracking..
|
|
|
X. Varona, Jordi Gonzalez, Xavier Roca, & Juan J. Villanueva. (2000). iTrack: Image-based Probabilistic Tracking of People..
|
|
|
D. Lloret, Joan Serrat, Antonio Lopez, A. Soler, & Juan J. Villanueva. (2000). Retinal image registration using creases as anatomical landmarks..
Abstract: Retinal images are routinely used in ophthalmology to study the optical nerve head and the retina. To assess objectively the evolution of an illness, images taken at different times must be registered. Most methods so far have been designed specifically for a single image modality, like temporal series or stereo pairs of angiographies, fluorescein angiographies or scanning laser ophthalmoscope (SLO) images, which makes them prone to fail when conditions vary. In contrast, the method we propose has shown to be accurate and reliable on all the former modalities. It has been adapted from the 3D registration of CT and MR image to 2D. Relevant features (also known as landmarks) are extracted by means of a robust creaseness operator, and resulting images are iteratively transformed until a maximum in their correlation is achieved. Our method has succeeded in more than 100 pairs tried so far, in all cases including also the scaling as a parameter to be optimized
|
|
|
Robert Benavente, Gemma Sanchez, Ramon Baldrich, Maria Vanrell, & Josep Llados. (2000). Normalized colour segmentation for human appearance description..
|
|
|
Xose M. Pardo, & Petia Radeva. (2000). Discriminant snakes for 3D reconstruction in medical Images..
|
|
|
Ricardo Toledo, X. Orriols, Petia Radeva, X. Binefa, Jordi Vitria, Cristina Cañero, et al. (2000). Eigensnakes for vessel segmentation in angiography..
|
|
|
Cristina Cañero, Petia Radeva, Ricardo Toledo, Juan J. Villanueva, & J. Mauri. (2000). 3D Curve Reconstruction by Biplane Snakes..
|
|
|
Joan Serrat, Antonio Lopez, & D. Lloret. (2000). On ridges and valleys..
|
|
|
A. Pujol, Felipe Lumbreras, X. Varona, & Juan J. Villanueva. (2000). Locating people in indoor scenes for real applications..
|
|
|
Youssef El Rhabi, Simon Loic, & Brun Luc. (2015). Estimation de la pose d’une caméra à partir d’un flux vidéo en s’approchant du temps réel. In 15ème édition d'ORASIS, journées francophones des jeunes chercheurs en vision par ordinateur ORASIS2015.
Abstract: Finding a way to estimate quickly and robustly the pose of an image is essential in augmented reality. Here we will discuss the approach we chose in order to get closer to real time by using SIFT points [4]. We propose a method based on filtering both SIFT points and images on which to focus on. Hence we will focus on relevant data.
Keywords: Augmented Reality; SFM; SLAM; real time pose computation; 2D/3D registration
|
|
|
R. Herault, Franck Davoine, Fadi Dornaika, & Y. Grandvalet. (2006). Simultaneous and robust face and facial action tracking.
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, Miguel Reyes, Oscar Lopes, Isabelle Guyon, et al. (2013). Multi-modal Gesture Recognition Challenge 2013: Dataset and Results. In 15th ACM International Conference on Multimodal Interaction (pp. 445–452).
Abstract: The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable
depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13; 858 gestures from a lexicon of 20 Italian gesture categories recorded with a KinectTM camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1:720:800 frames. In addition to the 20 main gesture categories, ‘distracter’ gestures are included, meaning that additional audio
and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results
were obtained by the first ranked participants.
|
|
|
Victor Ponce, Sergio Escalera, & Xavier Baro. (2013). Multi-modal Social Signal Analysis for Predicting Agreement in Conversation Settings. In 15th ACM International Conference on Multimodal Interaction (pp. 495–502).
Abstract: In this paper we present a non-invasive ambient intelligence framework for the analysis of non-verbal communication applied to conversational settings. In particular, we apply feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues coming from the fields of psychology and observational methodology. We test our methodology over data captured in victim-offender mediation scenarios. Using different state-of-the-art classification approaches, our system achieve upon 75% of recognition predicting agreement among the parts involved in the conversations, using as ground truth the experts opinions.
|
|
|
Yaxing Wang, Chenshen Wu, Luis Herranz, Joost Van de Weijer, Abel Gonzalez-Garcia, & Bogdan Raducanu. (2018). Transferring GANs: generating images from limited data. In 15th European Conference on Computer Vision (Vol. 11210, pp. 220–236). LNCS.
Abstract: ransferring knowledge of pre-trained networks to new domains by means of fine-tuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pre-trained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pre-trained model was trained without conditioning. Our results also suggest that density is more important than diversity and a dataset with one or few densely sampled classes is a better source model than more diverse datasets such as ImageNet or Places.
Keywords: Generative adversarial networks; Transfer learning; Domain adaptation; Image generation
|
|
|
Pau Rodriguez, Josep M. Gonfaus, Guillem Cucurull, Xavier Roca, & Jordi Gonzalez. (2018). Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery. In 15th European Conference on Computer Vision (Vol. 11212, pp. 357–372). LNCS.
Abstract: We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.
Keywords: Deep Learning; Convolutional Neural Networks; Attention
|
|