|
Fares Alnajar, Theo Gevers, Roberto Valenti, & Sennay Ghebreab. (2013). Calibration-free Gaze Estimation using Human Gaze Patterns. In 15th IEEE International Conference on Computer Vision (pp. 137–144).
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3 im. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
|
|
|
Hamdi Dibeklioglu, Albert Ali Salah, & Theo Gevers. (2013). Like Father, Like Son: Facial Expression Dynamics for Kinship Verification. In 15th IEEE International Conference on Computer Vision (pp. 1497–1504).
Abstract: Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, we show that it is possible to improve the state of the art in this problem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on different kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.
|
|
|
Jasper Uilings, Koen E.A. van de Sande, Theo Gevers, & Arnold Smeulders. (2013). Selective Search for Object Recognition. IJCV - International Journal of Computer Vision, 104(2), 154–171.
Abstract: This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software: http://disi.unitn.it/~uijlings/SelectiveSearch.html).
|
|
|
Zeynep Yucel, Albert Ali Salah, Çetin Meriçli, Tekin Meriçli, Roberto Valenti, & Theo Gevers. (2013). Joint Attention by Gaze Interpolation and Saliency. T-CIBER - IEEE Transactions on cybernetics, 829–842.
Abstract: Joint attention, which is the ability of coordination of a common point of reference with the communicating party, emerges as a key factor in various interaction scenarios. This paper presents an image-based method for establishing joint attention between an experimenter and a robot. The precise analysis of the experimenter's eye region requires stability and high-resolution image acquisition, which is not always available. We investigate regression-based interpolation of the gaze direction from the head pose of the experimenter, which is easier to track. Gaussian process regression and neural networks are contrasted to interpolate the gaze direction. Then, we combine gaze interpolation with image-based saliency to improve the target point estimates and test three different saliency schemes. We demonstrate the proposed method on a human-robot interaction scenario. Cross-subject evaluations, as well as experiments under adverse conditions (such as dimmed or artificial illumination or motion blur), show that our method generalizes well and achieves rapid gaze estimation for establishing joint attention.
|
|
|
Jorge Bernal, Fernando Vilariño, & F. Javier Sanchez. (2011). Towards Intelligent Systems for Colonoscopy. In Paul Miskovitz (Ed.), Colonoscopy (Vol. 1, pp. 257–282). Intech.
Abstract: In this chapter we present tools that can be used to build intelligent systems for colonoscopy.
The idea is, by using methods based on computer vision and artificial intelligence, add significant value to the colonoscopy procedure. Intelligent systems are being used to assist in other medical interventions
|
|
|
Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2011). Integration of Valley Orientation Distribution for Polyp Region Identification in Colonoscopy. In In MICCAI 2011 Workshop on Computational and Clinical Applications in Abdominal Imaging (Vol. 6668, pp. 76–83). Lecture Notes in Computer Science. Springer Link.
Abstract: This work presents a region descriptor based on the integration of the information that the depth of valleys image provides. The depth of valleys image is based on the presence of intensity valleys around polyps due to the image acquisition. Our proposed method consists of defining, for each point, a series of radial sectors around it and then accumulates the maxima of the depth of valleys image only if the orientation of the intensity valley coincides with the orientation of the sector above. We apply our descriptor to a prior segmentation of the images and we present promising results on polyp detection, outperforming other approaches that also integrate depth of valleys information.
|
|
|
Petia Radeva, Jordi Vitria, Fernando Vilariño, Panagiota Spyridonos, Fernando Azpiroz, Juan Malagelada, et al. (2009). Cascade analysis for intestinal contraction detection. US Patent Office.
Abstract: A method and system cascade analysisi for intestinal contraction detection is provided by extracting from image frames captured in-vivo. The method and system also relate to the detection of turbid liquids in intestinal tracts, to automatic detection of video image frames taken in the gastrointestinal tract including a field of view obstructed by turbid media, and more particulary, to extraction of image data obstructed by turbid media.
|
|
|
Panagiota Spyridonos, Fernando Vilariño, Jordi Vitria, Petia Radeva, Fernando Azpiroz, & Juan Malagelada. (2011). Device, system and method for automatic detection of contractile activity in an image frame.
Abstract: A device, system and method for automatic detection of contractile activity of a body lumen in an image frame is provided, wherein image frames during contractile activity are captured and/or image frames including contractile activity are automatically detected, such as through pattern recognition and/or feature extraction to trace image frames including contractions, e.g., with wrinkle patterns. A manual procedure of annotation of contractions, e.g. tonic contractions in capsule endoscopy, may consist of the visualization of the whole video by a specialist, and the labeling of the contraction frames. Embodiments of the present invention may be suitable for implementation in an in vivo imaging system.
|
|
|
Fernando Vilariño, Panagiota Spyridonos, Petia Radeva, Jordi Vitria, Fernando Azpiroz, & Juan Malagelada. (2010). Method for automatic classification of in vivo images.
Abstract: A method for automatically detecting a post-duodenal boundary in an image stream of the gastrointestinal (GI) tract. The image stream is sampled to obtain a reduced set of images for processing. The reduced set of images is filtered to remove non-valid frames or non-valid portions of frames, thereby generating a filtered set of valid images. A polar representation of the valid images is generated. Textural features of the polar representation are processed to detect the post-duodenal boundary of the GI tract.
|
|
|
Gerard Lacey, & Fernando Vilariño. (2011). Endoscopy system with motion sensors.
Abstract: An endoscopy system (1) comprises an endoscope (2) with a camera (3) at its tip. The endoscope extends through an endoscope guide (4) for guiding movement of the endoscope and for measurement of its movement as it enters the body. The guide (4) comprises a generally conical body (5) having a through passage (105) through which the endoscope (2) extends. A motion sensor comprises an optical transmitter (7) and a detector (8) mounted alongside the passage (105) to measure the insertion-withdrawal linear motion and also rotation of the endoscope by the endoscopist's hand. The system (1) also comprises a flexure controller (10) having wheels operated by the endoscopist. The camera (3), the motion sensor (7/8), and the flexure controller (10) are all connected to a processor (11) which feeds a display.
|
|
|
Fernando Vilariño, Panagiota Spyridonos, Petia Radeva, Jordi Vitria, Fernando Azpiroz, & Juan Malagelada. (2009). Device, system and method for measurement and analysis of contractile activity.
Abstract: A method and system for determining intestinal dysfunction condition are provided by classifying and analyzing image frames captured in-vivo. The method and system also relate to the detection of contractile activity in intestinal tracts, to automatic detection of video image frames taken in the gastrointestinal tract including contractile activity, and more particularly to measurement and analysis of contractile activity of the GI tract based on image intensity of in vivo image data.
|
|
|
Pierluigi Casale, Oriol Pujol, & Petia Radeva. (2012). Personalization and User Verification in Wearable Systems using Biometric Walking Patterns. PUC - Personal and Ubiquitous Computing, 16(5), 563–580.
Abstract: In this article, a novel technique for user’s authentication and verification using gait as a biometric unobtrusive pattern is proposed. The method is based on a two stages pipeline. First, a general activity recognition classifier is personalized for an specific user using a small sample of her/his walking pattern. As a result, the system is much more selective with respect to the new walking pattern. A second stage verifies whether the user is an authorized one or not. This stage is defined as a one-class classification problem. In order to solve this problem, a four-layer architecture is built around the geometric concept of convex hull. This architecture allows to improve robustness to outliers, modeling non-convex shapes, and to take into account temporal coherence information. Two different scenarios are proposed as validation with two different wearable systems. First, a custom high-performance wearable system is built and used in a free environment. A second dataset is acquired from an Android-based commercial device in a ‘wild’ scenario with rough terrains, adversarial conditions, crowded places and obstacles. Results on both systems and datasets are very promising, reducing the verification error rates by an order of magnitude with respect to the state-of-the-art technologies.
|
|
|
Marco Pedersoli, Jordi Gonzalez, Andrew Bagdanov, & Xavier Roca. (2011). Efficient Discriminative Multiresolution Cascade for Real-Time Human Detection Applications. PRL - Pattern Recognition Letters, 32(13), 1581–1587.
Abstract: Human detection is fundamental in many machine vision applications, like video surveillance, driving assistance, action recognition and scene understanding. However in most of these applications real-time performance is necessary and this is not achieved yet by current detection methods.
This paper presents a new method for human detection based on a multiresolution cascade of Histograms of Oriented Gradients (HOG) that can highly reduce the computational cost of detection search without affecting accuracy. The method consists of a cascade of sliding window detectors. Each detector is a linear Support Vector Machine (SVM) composed of HOG features at different resolutions, from coarse at the first level to fine at the last one.
In contrast to previous methods, our approach uses a non-uniform stride of the sliding window that is defined by the feature resolution and allows the detection to be incrementally refined as going from coarse-to-fine resolution. In this way, the speed-up of the cascade is not only due to the fewer number of features computed at the first levels of the cascade, but also to the reduced number of windows that need to be evaluated at the coarse resolution. Experimental results show that our method reaches a detection rate comparable with the state-of-the-art of detectors based on HOG features, while at the same time the detection search is up to 23 times faster.
|
|
|
Palaiahnakote Shivakumara, Anjan Dutta, Trung Quy Phan, Chew Lim Tan, & Umapada Pal. (2011). A Novel Mutual Nearest Neighbor based Symmetry for Text Frame Classification in Video. PR - Pattern Recognition, 44(8), 1671–1683.
Abstract: In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max–Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.
|
|
|
Victor Ponce, Sergio Escalera, & Xavier Baro. (2013). Multi-modal Social Signal Analysis for Predicting Agreement in Conversation Settings. In 15th ACM International Conference on Multimodal Interaction (pp. 495–502).
Abstract: In this paper we present a non-invasive ambient intelligence framework for the analysis of non-verbal communication applied to conversational settings. In particular, we apply feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues coming from the fields of psychology and observational methodology. We test our methodology over data captured in victim-offender mediation scenarios. Using different state-of-the-art classification approaches, our system achieve upon 75% of recognition predicting agreement among the parts involved in the conversations, using as ground truth the experts opinions.
|
|