|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2016). Action Recognition by Pairwise Proximity Function Support Vector Machines with Dynamic Time Warping Kernels. In 29th Canadian Conference on Artificial Intelligence (Vol. 9673, pp. 3–14). Springer International Publishing.
Abstract: In the context of human action recognition using skeleton data, the 3D trajectories of joint points may be considered as multi-dimensional time series. The traditional recognition technique in the literature is based on time series dis(similarity) measures (such as Dynamic Time Warping). For these general dis(similarity) measures, k-nearest neighbor algorithms are a natural choice. However, k-NN classifiers are known to be sensitive to noise and outliers. In this paper, a new class of Support Vector Machine that is applicable to trajectory classification, such as action recognition, is developed by incorporating an efficient time-series distances measure into the kernel function. More specifically, the derivative of Dynamic Time Warping (DTW) distance measure is employed as the SVM kernel. In addition, the pairwise proximity learning strategy is utilized in order to make use of non-positive semi-definite (PSD) kernels in the SVM formulation. The recognition results of the proposed technique on two action recognition datasets demonstrates the ourperformance of our methodology compared to the state-of-the-art methods. Remarkably, we obtained 89 % accuracy on the well-known MSRAction3D dataset using only 3D trajectories of body joints obtained by Kinect
|
|
|
Marc Masana, Idoia Ruiz, Joan Serrat, Joost Van de Weijer, & Antonio Lopez. (2018). Metric Learning for Novelty and Anomaly Detection. In 29th British Machine Vision Conference.
Abstract: When neural networks process images which do not resemble the distribution seen during training, so called out-of-distribution images, they often make wrong predictions, and do so too confidently. The capability to detect out-of-distribution images is therefore crucial for many real-world applications. We divide out-of-distribution detection between novelty detection ---images of classes which are not in the training set but are related to those---, and anomaly detection ---images with classes which are unrelated to the training set. By related we mean they contain the same type of objects, like digits in MNIST and SVHN. Most existing work has focused on anomaly detection, and has addressed this problem considering networks trained with the cross-entropy loss. Differently from them, we propose to use metric learning which does not have the drawback of the softmax layer (inherent to cross-entropy methods), which forces the network to divide its prediction power over the learned classes. We perform extensive experiments and evaluate both novelty and anomaly detection, even in a relevant application such as traffic sign recognition, obtaining comparable or better results than previous works.
|
|
|
Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, & Sergio Escalera. (2018). Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues. In 29th British Machine Vision Conference.
Abstract: Gaze behavior is an important non-verbal cue in social signal processing and humancomputer interaction. In this paper, we tackle the problem of person- and head poseindependent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on
EYEDIAP dataset, further improved by 4% when the temporal modality is included.
|
|
|
Patricia Suarez, Angel Sappa, Boris X. Vintimilla, & Riad I. Hammoud. (2021). Cycle Generative Adversarial Network: Towards A Low-Cost Vegetation Index Estimation. In 28th IEEE International Conference on Image Processing (pp. 19–22).
Abstract: This paper presents a novel unsupervised approach to estimate the Normalized Difference Vegetation Index (NDVI). The NDVI is obtained as the ratio between information from the visible and near infrared spectral bands; in the current work, the NDVI is estimated just from an image of the visible spectrum through a Cyclic Generative Adversarial Network (CyclicGAN). This unsupervised architecture learns to estimate the NDVI index by means of an image translation between the red channel of a given RGB image and the NDVI unpaired index’s image. The translation is obtained by means of a ResNET architecture and a multiple loss function. Experimental results obtained with this unsupervised scheme show the validity of the implemented model. Additionally, comparisons with the state of the art approaches are provided showing improvements with the proposed approach.
|
|
|
Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2010). Reduction of Pattern Search Area in Colonoscopy Images by Merging Non-Informative Regions. In 28th Congreso Anual de la Sociedad Española de Ingeniería Biomédica.
Abstract: One of the first usual steps in pattern recognition schemas is image segmentation, in order to reduce the dimensionality of the problem and manage smaller quantity of data. In our case as we are pursuing real-time colon cancer polyp detection, this step is crucial. In this paper we present a non-informative region estimation algorithm that will let us discard some parts of the image where we will not expect to find colon cancer polyps. The performance of our approach will be measured in terms of both non-informative areas elimination and polyps’ areas preserving. The results obtained show the importance of having correct non- informative region estimation in order to fasten the whole recognition process.
|
|
|
Antonio Esteban Lansaque, Carles Sanchez, Agnes Borras, Marta Diez-Ferrer, Antoni Rosell, & Debora Gil. (2016). Stable Airway Center Tracking for Bronchoscopic Navigation. In 28th Conference of the international Society for Medical Innovation and Technology.
Abstract: Bronchoscopists use X‐ray fluoroscopy to guide bronchoscopes to the lesion to be biopsied without any kind of incisions. Reducing exposure to X‐ray is important for both patients and doctors but alternatives like electromagnetic navigation require specific equipment and increase the cost of the clinical procedure. We propose a guiding system based on the extraction of airway centers from intra‐operative videos. Such anatomical landmarks could be
matched to the airway centerline extracted from a pre‐planned CT to indicate the best path to the lesion. We present an extraction of lumen centers
from intra‐operative videos based on tracking of maximal stable regions of energy maps.
|
|
|
Carles Sanchez, Debora Gil, T. Gache, N. Koufos, Marta Diez-Ferrer, & Antoni Rosell. (2016). SENSA: a System for Endoscopic Stenosis Assessment. In 28th Conference of the international Society for Medical Innovation and Technology.
Abstract: Documenting the severity of a static or dynamic Central Airway Obstruction (CAO) is crucial to establish proper diagnosis and treatment, predict possible treatment effects and better follow-up the patients. The subjective visual evaluation of a stenosis during video-bronchoscopy still remains the most common way to assess a CAO in spite of a consensus among experts for a need to standardize all calculations [1].
The Computer Vision Center in cooperation with the «Hospital de Bellvitge», has developed a System for Endoscopic Stenosis Assessment (SENSA), which computes CAO directly by analyzing standard bronchoscopic data without the need of using other imaging tecnologies.
|
|
|
Daniel Hernandez, Lukas Schneider, Antonio Espinosa, David Vazquez, Antonio Lopez, Uwe Franke, et al. (2017). Slanted Stixels: Representing San Francisco's Steepest Streets}. In 28th British Machine Vision Conference.
Abstract: In this work we present a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced that uses an extremely efficient over-segmentation. In doing so, the computational complexity of the Stixel inference algorithm is reduced significantly, achieving real-time computation capabilities with only a slight drop in accuracy. We evaluate the proposed approach in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.
|
|
|
Arash Akbarinia, Raquel Gil Rodriguez, & C. Alejandro Parraga. (2017). Colour Constancy: Biologically-inspired Contrast Variant Pooling Mechanism. In 28th British Machine Vision Conference.
Abstract: Pooling is a ubiquitous operation in image processing algorithms that allows for higher-level processes to collect relevant low-level features from a region of interest. Currently, max-pooling is one of the most commonly used operators in the computational literature. However, it can lack robustness to outliers due to the fact that it relies merely on the peak of a function. Pooling mechanisms are also present in the primate visual cortex where neurons of higher cortical areas pool signals from lower ones. The receptive fields of these neurons have been shown to vary according to the contrast by aggregating signals over a larger region in the presence of low contrast stimuli. We hypothesise that this contrast-variant-pooling mechanism can address some of the shortcomings of maxpooling. We modelled this contrast variation through a histogram clipping in which the percentage of pooled signal is inversely proportional to the local contrast of an image. We tested our hypothesis by applying it to the phenomenon of colour constancy where a number of popular algorithms utilise a max-pooling step (e.g. White-Patch, Grey-Edge and Double-Opponency). For each of these methods, we investigated the consequences of replacing their original max-pooling by the proposed contrast-variant-pooling. Our experiments on three colour constancy benchmark datasets suggest that previous results can significantly improve by adopting a contrast-variant-pooling mechanism.
|
|
|
Rada Deeb, Damien Muselet, Mathieu Hebert, Alain Tremeau, & Joost Van de Weijer. (2017). 3D color charts for camera spectral sensitivity estimation. In 28th British Machine Vision Conference.
Abstract: Estimating spectral data such as camera sensor responses or illuminant spectral power distribution from raw RGB camera outputs is crucial in many computer vision applications.
Usually, 2D color charts with various patches of known spectral reflectance are
used as reference for such purpose. Deducing n-D spectral data (n»3) from 3D RGB inputs is an ill-posed problem that requires a high number of inputs. Unfortunately, most of the natural color surfaces have spectral reflectances that are well described by low-dimensional linear models, i.e. each spectral reflectance can be approximated by a weighted sum of the others. It has been shown that adding patches to color charts does not help in practice, because the information they add is redundant with the information provided by the first set of patches. In this paper, we propose to use spectral data of
higher dimensionality by using 3D color charts that create inter-reflections between the surfaces. These inter-reflections produce multiplications between natural spectral curves and so provide non-linear spectral curves. We show that such data provide enough information for accurate spectral data estimation.
|
|
|
Mikhail Mozerov. (2006). An Effective Stereo Matching Algorithm with Optimal Path Cost Aggregation. In 28th Annual Symposium of the German Association for Pattern Recognition, LNCS 4174: 617–626.
|
|
|
Dani Rowe, I. Reid, Jordi Gonzalez, & Juan J. Villanueva. (2006). Unconstrained Multiple-People Tracking. In 28th Annual Symposium of the German Association for Pattern Recognition, LNCS 4174: 505–514, ISBN 978–3–540–44412–1.
|
|
|
Mikhail Mozerov, Ignasi Rius, Xavier Roca, & Jordi Gonzalez. (2006). 3D Human Motion Sequences Synchronization Using Dense Matching Algorithm. In 28th Annual Symposium of the German Association for Pattern Recognition, LNCS 4174: 485–494, ISBN 978–3–540–44412–1.
|
|
|
B. Zhou, Agata Lapedriza, J. Xiao, A. Torralba, & A. Oliva. (2014). Learning Deep Features for Scene Recognition using Places Database. In 28th Annual Conference on Neural Information Processing Systems (pp. 487–495).
|
|
|
David Berga, & Xavier Otazu. (2019). Computations of inhibition of return mechanisms by modulating V1 dynamics. In 28th Annual Computational Neuroscience Meeting.
Abstract: In this study we present a unifed model of the visual cortex for predicting visual attention using real image scenes. Feedforward mechanisms from RGC and LGN have been functionally modeled using wavelet filters at distinct orientations and scales for each chromatic pathway (Magno-, Parvo-, Konio-cellular) and polarity (ON-/OFF-center), by processing image components in the CIE Lab space. In V1, we process cortical interactions with an excitatory-inhibitory network of fring rate neurons, initially proposed by (Li, 1999), later extended by (Penacchio et al. 2013). Firing rates from model’s output have been used as predictors of neuronal activity to be projected in a map in superior colliculus (with WTA-like computations), determining locations of visual fxations. These locations will be considered as already visited areas for future saccades, therefore we integrated a spatiotemporal function of inhibition of return mechanisms (where LIP/FEF is responsible) to feed to the model with spatial memory for next saccades. Foveation mechanisms have been simulated with a cortical magnifcation function, which distort spatial viewing properties for each fxation. Results show lower prediction errors than with respect no IoR cases (Fig. 1), and it is functionally consistent with human psychophysical measurements. Our model follows a biologically-constrained architecture, previously shown to reproduce visual saliency (Berga & Otazu, 2018), visual discomfort (Penacchio et al. 2016), brightness (Penacchio et al. 2013) and chromatic induction (Cerda & Otazu, 2016).
|
|