|
Jaume Amores. 2013. Multiple Instance Classification: review, taxonomy and comparative study. AI, 201, 81–105.
Abstract: Multiple Instance Learning (MIL) has become an important topic in the pattern recognition community, and many solutions to this problemhave been proposed until now. Despite this fact, there is a lack of comparative studies that shed light into the characteristics and behavior of the different methods. In this work we provide such an analysis focused on the classification task (i.e.,leaving out other learning tasks such as regression). In order to perform our study, we implemented
fourteen methods grouped into three different families. We analyze the performance of the approaches across a variety of well-known databases, and we also study their behavior in synthetic scenarios in order to highlight their characteristics. As a result of this analysis, we conclude that methods that extract global bag-level information show a clearly superior performance in general. In this sense, the analysis permits us to understand why some types of methods are more successful than others, and it permits us to establish guidelines in the design of new MIL
methods.
Keywords: Multi-instance learning; Codebook; Bag-of-Words
|
|
|
Xavier Soria, Angel Sappa and Riad I. Hammoud. 2018. Wide-Band Color Imagery Restoration for RGB-NIR Single Sensor Images. SENS, 18(7), 2059.
Abstract: Multi-spectral RGB-NIR sensors have become ubiquitous in recent years. These sensors allow the visible and near-infrared spectral bands of a given scene to be captured at the same time. With such cameras, the acquired imagery has a compromised RGB color representation due to near-infrared bands (700–1100 nm) cross-talking with the visible bands (400–700 nm).
This paper proposes two deep learning-based architectures to recover the full RGB color images, thus removing the NIR information from the visible bands. The proposed approaches directly restore the high-resolution RGB image by means of convolutional neural networks. They are evaluated with several outdoor images; both architectures reach a similar performance when evaluated in different
scenarios and using different similarity metrics. Both of them improve the state of the art approaches.
Keywords: RGB-NIR sensor; multispectral imaging; deep learning; CNNs
|
|
|
Carme Julia, Angel Sappa, Felipe Lumbreras, Joan Serrat and Antonio Lopez. 2010. An Iterative Multiresolution Scheme for SFM with Missing Data: single and multiple object scenes. IMAVIS, 28(1), 164–176.
Abstract: Most of the techniques proposed for tackling the Structure from Motion problem (SFM) cannot deal with high percentages of missing data in the matrix of trajectories. Furthermore, an additional problem should be faced up when working with multiple object scenes: the rank of the matrix of trajectories should be estimated. This paper presents an iterative multiresolution scheme for SFM with missing data to be used in both the single and multiple object cases. The proposed scheme aims at recovering missing entries in the original input matrix. The objective is to improve the results by applying a factorization technique to the partially or totally filled in matrix instead of to the original input one. Experimental results obtained with synthetic and real data sequences, containing single and multiple objects, are presented to show the viability of the proposed approach.
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, Muhammad Anwer Rao, Andrew Bagdanov, Michael Felsberg and Jorma. 2018. Scale coding bag of deep features for human attribute and action recognition. MVAP, 29(1), 55–71.
Abstract: Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.
Keywords: Action recognition; Attribute recognition; Bag of deep features
|
|
|
Sudeep Katakol, Basem Elbarashy, Luis Herranz, Joost Van de Weijer and Antonio Lopez. 2021. Distributed Learning and Inference with Compressed Images. TIP, 30, 3069–3083.
Abstract: Modern computer vision requires processing large amounts of data, both while training the model and/or during inference, once the model is deployed. Scenarios where images are captured and processed in physically separated locations are increasingly common (e.g. autonomous vehicles, cloud computing). In addition, many devices suffer from limited resources to store or transmit data (e.g. storage space, channel capacity). In these scenarios, lossy image compression plays a crucial role to effectively increase the number of images collected under such constraints. However, lossy compression entails some undesired degradation of the data that may harm the performance of the downstream analysis task at hand, since important semantic information may be lost in the process. Moreover, we may only have compressed images at training time but are able to use original images at inference time, or vice versa, and in such a case, the downstream model suffers from covariate shift. In this paper, we analyze this phenomenon, with a special focus on vision-based perception for autonomous driving as a paradigmatic scenario. We see that loss of semantic information and covariate shift do indeed exist, resulting in a drop in performance that depends on the compression rate. In order to address the problem, we propose dataset restoration, based on image restoration with generative adversarial networks (GANs). Our method is agnostic to both the particular image compression method and the downstream task; and has the advantage of not adding additional cost to the deployed models, which is particularly important in resource-limited devices. The presented experiments focus on semantic segmentation as a challenging use case, cover a broad range of compression rates and diverse datasets, and show how our method is able to significantly alleviate the negative effects of compression on the downstream visual task.
|
|
|
Aura Hernandez-Sabate, Debora Gil, Jaume Garcia and Enric Marti. 2011. Image-based Cardiac Phase Retrieval in Intravascular Ultrasound Sequences. T-UFFC, 58(1), 60–72.
Abstract: Longitudinal motion during in vivo pullbacks acquisition of intravascular ultrasound (IVUS) sequences is a major artifact for 3-D exploring of coronary arteries. Most current techniques are based on the electrocardiogram (ECG) signal to obtain a gated pullback without longitudinal motion by using specific hardware or the ECG signal itself. We present an image-based approach for cardiac phase retrieval from coronary IVUS sequences without an ECG signal. A signal reflecting cardiac motion is computed by exploring the image intensity local mean evolution. The signal is filtered by a band-pass filter centered at the main cardiac frequency. Phase is retrieved by computing signal extrema. The average frame processing time using our setup is 36 ms. Comparison to manually sampled sequences encourages a deeper study comparing them to ECG signals.
Keywords: 3-D exploring; ECG; band-pass filter; cardiac motion; cardiac phase retrieval; coronary arteries; electrocardiogram signal; image intensity local mean evolution; image-based cardiac phase retrieval; in vivo pullbacks acquisition; intravascular ultrasound sequences; longitudinal motion; signal extrema; time 36 ms; band-pass filters; biomedical ultrasonics; cardiovascular system; electrocardiography; image motion analysis; image retrieval; image sequences; medical image processing; ultrasonic imaging
|
|
|
Joan Serrat, Felipe Lumbreras, Francisco Blanco, Manuel Valiente and Montserrat Lopez-Mesas. 2017. myStone: A system for automatic kidney stone classification. ESA, 89, 41–51.
Abstract: Kidney stone formation is a common disease and the incidence rate is constantly increasing worldwide. It has been shown that the classification of kidney stones can lead to an important reduction of the recurrence rate. The classification of kidney stones by human experts on the basis of certain visual color and texture features is one of the most employed techniques. However, the knowledge of how to analyze kidney stones is not widespread, and the experts learn only after being trained on a large number of samples of the different classes. In this paper we describe a new device specifically designed for capturing images of expelled kidney stones, and a method to learn and apply the experts knowledge with regard to their classification. We show that with off the shelf components, a carefully selected set of features and a state of the art classifier it is possible to automate this difficult task to a good degree. We report results on a collection of 454 kidney stones, achieving an overall accuracy of 63% for a set of eight classes covering almost all of the kidney stones taxonomy. Moreover, for more than 80% of samples the real class is the first or the second most probable class according to the system, being then the patient recommendations for the two top classes similar. This is the first attempt towards the automatic visual classification of kidney stones, and based on the current results we foresee better accuracies with the increase of the dataset size.
Keywords: Kidney stone; Optical device; Computer vision; Image classification
|
|
|
Jose Carlos Rubio, Joan Serrat, Antonio Lopez and Daniel Ponsa. 2012. Multiple target tracking for intelligent headlights control. TITS, 13(2), 594–605.
Abstract: Intelligent vehicle lighting systems aim at automatically regulating the headlights' beam to illuminate as much of the road ahead as possible while avoiding dazzling other drivers. A key component of such a system is computer vision software that is able to distinguish blobs due to vehicles' headlights and rear lights from those due to road lamps and reflective elements such as poles and traffic signs. In a previous work, we have devised a set of specialized supervised classifiers to make such decisions based on blob features related to its intensity and shape. Despite the overall good performance, there remain challenging that have yet to be solved: notably, faint and tiny blobs corresponding to quite distant vehicles. In fact, for such distant blobs, classification decisions can be taken after observing them during a few frames. Hence, incorporating tracking could improve the overall lighting system performance by enforcing the temporal consistency of the classifier decision. Accordingly, this paper focuses on the problem of constructing blob tracks, which is actually one of multiple-target tracking (MTT), but under two special conditions: We have to deal with frequent occlusions, as well as blob splits and merges. We approach it in a novel way by formulating the problem as a maximum a posteriori inference on a Markov random field. The qualitative (in video form) and quantitative evaluation of our new MTT method shows good tracking results. In addition, we will also see that the classification performance of the problematic blobs improves due to the proposed MTT algorithm.
Keywords: Intelligent Headlights
|
|
|
Ferran Diego, Daniel Ponsa, Joan Serrat and Antonio Lopez. 2011. Video Alignment for Change Detection. TIP, 20(7), 1858–1869.
Abstract: In this work, we address the problem of aligning two video sequences. Such alignment refers to synchronization, i.e., the establishment of temporal correspondence between frames of the first and second video, followed by spatial registration of all the temporally corresponding frames. Video synchronization and alignment have been attempted before, but most often in the relatively simple cases of fixed or rigidly attached cameras and simultaneous acquisition. In addition, restrictive assumptions have been applied, including linear time correspondence or the knowledge of the complete trajectories of corresponding scene points; to some extent, these assumptions limit the practical applicability of any solutions developed. We intend to solve the more general problem of aligning video sequences recorded by independently moving cameras that follow similar trajectories, based only on the fusion of image intensity and GPS information. The novelty of our approach is to pose the synchronization as a MAP inference problem on a Bayesian network including the observations from these two sensor types, which have been proved complementary. Alignment results are presented in the context of videos recorded from vehicles driving along the same track at different times, for different road types. In addition, we explore two applications of the proposed video alignment method, both based on change detection between aligned videos. One is the detection of vehicles, which could be of use in ADAS. The other is online difference spotting videos of surveillance rounds.
Keywords: video alignment
|
|
|
T. Mouats, N. Aouf, Angel Sappa, Cristhian A. Aguilera-Carrasco and Ricardo Toledo. 2015. Multi-Spectral Stereo Odometry. TITS, 16(3), 1210–1224.
Abstract: In this paper, we investigate the problem of visual odometry for ground vehicles based on the simultaneous utilization of multispectral cameras. It encompasses a stereo rig composed of an optical (visible) and thermal sensors. The novelty resides in the localization of the cameras as a stereo setup rather
than two monocular cameras of different spectrums. To the best of our knowledge, this is the first time such task is attempted. Log-Gabor wavelets at different orientations and scales are used to extract interest points from both images. These are then described using a combination of frequency and spatial information within the local neighborhood. Matches between the pairs of multimodal images are computed using the cosine similarity function based
on the descriptors. Pyramidal Lucas–Kanade tracker is also introduced to tackle temporal feature matching within challenging sequences of the data sets. The vehicle egomotion is computed from the triangulated 3-D points corresponding to the matched features. A windowed version of bundle adjustment incorporating
Gauss–Newton optimization is utilized for motion estimation. An outlier removal scheme is also included within the framework to deal with outliers. Multispectral data sets were generated and used as test bed. They correspond to real outdoor scenarios captured using our multimodal setup. Finally, detailed results validating the proposed strategy are illustrated.
Keywords: Egomotion estimation; feature matching; multispectral odometry (MO); optical flow; stereo odometry; thermal imagery
|
|