|
Cesar de Souza, Adrien Gaidon, Yohann Cabon and Antonio Lopez. 2017. Procedural Generation of Videos to Train Deep Action Recognition Networks. 30th IEEE Conference on Computer Vision and Pattern Recognition.2594–2604.
Abstract: Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for ”Procedural Human Action Videos”. It contains a total of 39, 982 videos, with more than 1, 000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We introduce a deep multi-task representation learning architecture to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF101 and HMDB51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, significantly
outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
|
|
|
X. Orriols, Ricardo Toledo, X. Binefa, Petia Radeva, Jordi Vitria and Juan J. Villanueva. 2000. Probabilistic Saliency Approach for Elongated Structure Detection using Deformable Models. 15 th International Conference on Pattern Recognition.1006–1009.
|
|
|
Karel Paleček, David Geronimo and Frederic Lerasle. 2012. Pre-attention cues for person detection. Cognitive Behavioural Systems, COST 2102 International Training School. Springer Berlin Heidelberg, 225–235. (LNCS.)
Abstract: Current state-of-the-art person detectors have been proven reliable and achieve very good detection rates. However, the performance is often far from real time, which limits their use to low resolution images only. In this paper, we deal with candidate window generation problem for person detection, i.e. we want to reduce the computational complexity of a person detector by reducing the number of regions that has to be evaluated. We base our work on Alexe’s paper [1], which introduced several pre-attention cues for generic object detection. We evaluate these cues in the context of person detection and show that their performance degrades rapidly for scenes containing multiple objects of interest such as pictures from urban environment. We extend this set by new cues, which better suits our class-specific task. The cues are designed to be simple and efficient, so that they can be used in the pre-attention phase of a more complex sliding window based person detector.
|
|
|
Fernando Barrera, Felipe Lumbreras, Cristhian Aguilera and Angel Sappa. 2012. Planar-Based Multispectral Stereo. 11th Quantitative InfraRed Thermography.
|
|
|
Ishaan Gulrajani and 6 others. 2017. PixelVAE: A Latent Variable Model for Natural Images. 5th International Conference on Learning Representations.
Abstract: Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but tend to suffer from image blurriness. PixelCNNs model sharp contours and details very well, but lack an explicit latent representation and have difficulty modeling large-scale structure in a computationally efficient way. In this paper, we present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. The resulting architecture achieves state-of-the-art log-likelihood on binarized MNIST. We extend PixelVAE to a hierarchy of multiple latent variables at different scales; this hierarchical model achieves competitive likelihood on 64x64 ImageNet and generates high-quality samples on LSUN bedrooms.
Keywords: Deep Learning; Unsupervised Learning
|
|
|
Carme Julia, Angel Sappa, Felipe Lumbreras and Joan Serrat. 2008. Photometric Stereo through and Adapted Alternation Approach. IEEE International Conference on Image Processing,.1500–1503.
|
|
|
P. Ricaurte, C. Chilan, Cristhian A. Aguilera-Carrasco, Boris X. Vintimilla and Angel Sappa. 2014. Performance Evaluation of Feature Point Descriptors in the Infrared Domain. 9th International Conference on Computer Vision Theory and Applications.545–550.
Abstract: This paper presents a comparative evaluation of classical feature point descriptors when they are used in the long-wave infrared spectral band. Robustness to changes in rotation, scaling, blur, and additive noise are evaluated using a state of the art framework. Statistical results using an outdoor image data set are presented together with a discussion about the differences with respect to the results obtained when images from the visible spectrum are considered.
Keywords: Infrared Imaging; Feature Point Descriptors
|
|
|
Diego Cheda, Daniel Ponsa and Antonio Lopez. 2012. Pedestrian Candidates Generation using Monocular Cues. IEEE Intelligent Vehicles Symposium. IEEE Xplore, 7–12.
Abstract: Common techniques for pedestrian candidates generation (e.g., sliding window approaches) are based on an exhaustive search over the image. This implies that the number of windows produced is huge, which translates into a significant time consumption in the classification stage. In this paper, we propose a method that significantly reduces the number of windows to be considered by a classifier. Our method is a monocular one that exploits geometric and depth information available on single images. Both representations of the world are fused together to generate pedestrian candidates based on an underlying model which is focused only on objects standing vertically on the ground plane and having certain height, according with their depths on the scene. We evaluate our algorithm on a challenging dataset and demonstrate its application for pedestrian detection, where a considerable reduction in the number of candidate windows is reached.
Keywords: pedestrian detection
|
|
|
Cristina Cañero and 16 others. 1999. Optimal Stent Implantation: Three-dimensional Evaluation of the Mutual Position of Stent and Vessel via Intracoronary Ecography. Proceedings of International Conference on Computer in Cardiology (CIC´99).
Abstract: We present a new automatic technique to visualize and quantify the mutual position between the stent and the vessel wall by considering their three-dimensional reconstruction. Two deformable generalized cylinders adapt to the image features in all IVUS planes corresponding to the vessel wall and the stent in order to reconstruct the boundaries of the stent and the vessel in space. The image features that characterize the stent and the vessel wall are determined in terms of edge and ridge image detectors taking into account the gray level of the image pixels. We show that the 30 reconstruction by deformable cylinders is accurate and robust due to the spatial data coherence in the considered volumetric IVUS image. The main clinic utility of the stent and vessel reconstruction by deformable’ cylinders consists of its possibility to visualize and to assess the optimal stent introduction.
|
|
|
Naveen Onkarappa, Sujay M. Veerabhadrappa and Angel Sappa. 2012. Optical Flow in Onboard Applications: A Study on the Relationship Between Accuracy and Scene Texture. 4th International Conference on Signal and Image Processing.257–267.
Abstract: Optical flow has got a major role in making advanced driver assistance systems (ADAS) a reality. ADAS applications are expected to perform efficiently in all kinds of environments, those are highly probable, that one can drive the vehicle in different kinds of roads, times and seasons. In this work, we study the relationship of optical flow with different roads, that is by analyzing optical flow accuracy on different road textures. Texture measures such as TeX , TeX and TeX are evaluated for this purpose. Further, the relation of regularization weight to the flow accuracy in the presence of different textures is also analyzed. Additionally, we present a framework to generate synthetic sequences of different textures in ADAS scenarios with ground-truth optical flow.
|
|