Abel Gonzalez-Garcia, Robert Benavente, Olivier Penacchio, Javier Vazquez, Maria Vanrell, & C. Alejandro Parraga. (2013). Coloresia: An Interactive Colour Perception Device for the Visually Impaired. In Multimodal Interaction in Image and Video Applications (Vol. 48, pp. 47–66). Springer Berlin Heidelberg.
Abstract: A significative percentage of the human population suffer from impairments in their capacity to distinguish or even see colours. For them, everyday tasks like navigating through a train or metro network map becomes demanding. We present a novel technique for extracting colour information from everyday natural stimuli and presenting it to visually impaired users as pleasant, non-invasive sound. This technique was implemented inside a Personal Digital Assistant (PDA) portable device. In this implementation, colour information is extracted from the input image and categorised according to how human observers segment the colour space. This information is subsequently converted into sound and sent to the user via speakers or headphones. In the original implementation, it is possible for the user to send its feedback to reconfigure the system, however several features such as these were not implemented because the current technology is limited.We are confident that the full implementation will be possible in the near future as PDA technology improves.
|
Muhammad Anwer Rao. (2013). Color for Object Detection and Action Recognition (Antonio Lopez, & Joost Van de Weijer, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Recognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition.
In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection.
In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task.
Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images.
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas, & Josep Llados. (2013). Classification of Administrative Document Images by Logo Identification. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
|
Jordi Roca, C. Alejandro Parraga, & Maria Vanrell. (2013). Chromatic settings and the structural color constancy index. JV - Journal of Vision, 13(4-3), 1–26.
Abstract: Color constancy is usually measured by achromatic setting, asymmetric matching, or color naming paradigms, whose results are interpreted in terms of indexes and models that arguably do not capture the full complexity of the phenomenon. Here we propose a new paradigm, chromatic setting, which allows a more comprehensive characterization of color constancy through the measurement of multiple points in color space under immersive adaptation. We demonstrated its feasibility by assessing the consistency of subjects' responses over time. The paradigm was applied to two-dimensional (2-D) Mondrian stimuli under three different illuminants, and the results were used to fit a set of linear color constancy models. The use of multiple colors improved the precision of more complex linear models compared to the popular diagonal model computed from gray. Our results show that a diagonal plus translation matrix that models mechanisms other than cone gain might be best suited to explain the phenomenon. Additionally, we calculated a number of color constancy indices for several points in color space, and our results suggest that interrelations among colors are not as uniform as previously believed. To account for this variability, we developed a new structural color constancy index that takes into account the magnitude and orientation of the chromatic shift in addition to the interrelations among colors and memory effects.
|
Sandra Jimenez, Xavier Otazu, Valero Laparra, & Jesus Malo. (2013). Chromatic induction and contrast masking: similar models, different goals? In Human Vision and Electronic Imaging XVIII (Vol. 8651).
Abstract: Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality assessment. Although the above models have formal and qualitative similarities, it does not necessarily mean that the mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms (disregarding spatial information), different parameters in the normalization give rise to optimal discrimination or adaptation, and different non-linearities may give rise to error minimization or component independence. In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits of masking in statistical independence terms. However, such statistical analysis has not been performed for spatio-chromatic induction models where chromatic perception depends on spatial configuration. In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence similarly as previously reported for masking models. Mutual information analysis suggests that seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple images. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
|
Fares Alnajar, Theo Gevers, Roberto Valenti, & Sennay Ghebreab. (2013). Calibration-free Gaze Estimation using Human Gaze Patterns. In 15th IEEE International Conference on Computer Vision (pp. 137–144).
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3 im. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
|
Joan M. Nuñez, Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2013). Blood Vessel Characterization in Colonoscopy Images to Improve Polyp Localization. In Proceedings of the International Conference on Computer Vision Theory and Applications (Vol. 1, pp. 162–171). SciTePress.
Abstract: This paper presents an approach to mitigate the contribution of blood vessels to the energy image used at different tasks of automatic colonoscopy image analysis. This goal is achieved by introducing a characterization of endoluminal scene objects which allows us to differentiate between the trace of 2-dimensional visual objects,such as vessels, and shades from 3-dimensional visual objects, such as folds. The proposed characterization is based on the influence that the object shape has in the resulting visual feature, and it leads to the development of a blood vessel attenuation algorithm. A database consisting of manually labelled masks was built in order to test the performance of our method, which shows an encouraging success in blood vessel mitigation while keeping other structures intact. Moreover, by extending our method to the only available polyp localization
algorithm tested on a public database, blood vessel mitigation proved to have a positive influence on the overall performance.
Keywords: Colonoscopy; Blood vessel; Linear features; Valley detection
|
Santiago Segui, Laura Igual, & Jordi Vitria. (2013). Bagged One Class Classifiers in the Presence of Outliers. IJPRAI - International Journal of Pattern Recognition and Artificial Intelligence, 27(5), 1350014–1350035.
Abstract: The problem of training classifiers only with target data arises in many applications where non-target data are too costly, difficult to obtain, or not available at all. Several one-class classification methods have been presented to solve this problem, but most of the methods are highly sensitive to the presence of outliers in the target class. Ensemble methods have therefore been proposed as a powerful way to improve the classification performance of binary/multi-class learning algorithms by introducing diversity into classifiers.
However, their application to one-class classification has been rather limited. In
this paper, we present a new ensemble method based on a non-parametric weighted bagging strategy for one-class classification, to improve accuracy in the presence of outliers. While the standard bagging strategy assumes a uniform data distribution, the method we propose here estimates a probability density based on a forest structure of the data. This assumption allows the estimation of data distribution from the computation of simple univariate and bivariate kernel densities. Experiments using original and noisy versions of 20 different datasets show that bagging ensemble methods applied to different one-class classifiers outperform base one-class classification methods. Moreover, we show that, in noisy versions of the datasets, the non-parametric weighted bagging strategy we propose outperforms the classical bagging strategy in a statistically significant way.
Keywords: One-class Classifier; Ensemble Methods; Bagging and Outliers
|
L. Rothacker, Marçal Rusiñol, & G.A. Fink. (2013). Bag-of-Features HMMs for segmentation-free word spotting in handwritten documents. In 12th International Conference on Document Analysis and Recognition (pp. 1305–1309).
Abstract: Recent HMM-based approaches to handwritten word spotting require large amounts of learning samples and mostly rely on a prior segmentation of the document. We propose to use Bag-of-Features HMMs in a patch-based segmentation-free framework that are estimated by a single sample. Bag-of-Features HMMs use statistics of local image feature representatives. Therefore they can be considered as a variant of discrete HMMs allowing to model the observation of a number of features at a point in time. The discrete nature enables us to estimate a query model with only a single example of the query provided by the user. This makes our method very flexible with respect to the availability of training data. Furthermore, we are able to outperform state-of-the-art results on the George Washington dataset.
|
Christophe Rigaud, Dimosthenis Karatzas, Joost Van de Weijer, Jean-Christophe Burie, & Jean-Marc Ogier. (2013). Automatic text localisation in scanned comic books. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 814–819).
Abstract: Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.
Keywords: Text localization; comics; text/graphic separation; complex background; unstructured document
|
Marina Alberti, Simone Balocco, Xavier Carrillo, J. Mauri, & Petia Radeva. (2013). Automatic non-rigid temporal alignment of IVUS sequences: method and quantitative validation. UMB - Ultrasound in Medicine and Biology, 39(9), 1698–712.
Abstract: Clinical studies on atherosclerosis regression/progression performed by intravascular ultrasound analysis would benefit from accurate alignment of sequences of the same patient before and after clinical interventions and at follow-up. In this article, a methodology for automatic alignment of intravascular ultrasound sequences based on the dynamic time warping technique is proposed. The non-rigid alignment is adapted to the specific task by applying it to multidimensional signals describing the morphologic content of the vessel. Moreover, dynamic time warping is embedded into a framework comprising a strategy to address partial overlapping between acquisitions and a term that regularizes non-physiologic temporal compression/expansion of the sequences. Extensive validation is performed on both synthetic and in vivo data. The proposed method reaches alignment errors of approximately 0.43 mm for pairs of sequences acquired during the same intervention phase and 0.77 mm for pairs of sequences acquired at successive intervention stages.
Keywords: Intravascular ultrasound; Dynamic time warping; Non-rigid alignment; Sequence matching; Partial overlapping strategy
|
Vitaliy Konovalov, Albert Clapes, & Sergio Escalera. (2013). Automatic Hand Detection in RGB-Depth Data Sequences. In 16th Catalan Conference on Artificial Intelligence (pp. 91–100). LNCS.
Abstract: Detecting hands in multi-modal RGB-Depth visual data has become a challenging Computer Vision problem with several applications of interest. This task involves dealing with changes in illumination, viewpoint variations, the articulated nature of the human body, the high flexibility of the wrist articulation, and the deformability of the hand itself. In this work, we propose an accurate and efficient automatic hand detection scheme to be applied in Human-Computer Interaction (HCI) applications in which the user is seated at the desk and, thus, only the upper body is visible. Our main hypothesis is that hand landmarks remain at a nearly constant geodesic distance from an automatically located anatomical reference point.
In a given frame, the human body is segmented first in the depth image. Then, a
graph representation of the body is built in which the geodesic paths are computed from the reference point. The dense optical flow vectors on the corresponding RGB image are used to reduce ambiguities of the geodesic paths’ connectivity, allowing to eliminate false edges interconnecting different body parts. Finally, we are able to detect the position of both hands based on invariant geodesic distances and optical flow within the body region, without involving costly learning procedures.
|
Miguel Reyes, Albert Clapes, Jose Ramirez, Juan R Revilla, & Sergio Escalera. (2013). Automatic Digital Biometry Analysis based on Depth Maps. COMPUTIND - Computers in Industry, 64(9), 1316–1325.
Abstract: World Health Organization estimates that 80% of the world population is affected by back-related disorders during his life. Current practices to analyze musculo-skeletal disorders (MSDs) are expensive, subjective, and invasive. In this work, we propose a tool for static body posture analysis and dynamic range of movement estimation of the skeleton joints based on 3D anthropometric information from multi-modal data. Given a set of keypoints, RGB and depth data are aligned, depth surface is reconstructed, keypoints are matched, and accurate measurements about posture and spinal curvature are computed. Given a set of joints, range of movement measurements is also obtained. Moreover, gesture recognition based on joint movements is performed to look for the correctness in the development of physical exercises. The system shows high precision and reliable measurements, being useful for posture reeducation purposes to prevent MSDs, as well as tracking the posture evolution of patients in rehabilitation treatments.
Keywords: Multi-modal data fusion; Depth maps; Posture analysis; Anthropometric data; Musculo-skeletal disorders; Gesture analysis
|
Francesco Brughi. (2013). Artistic Heritage Motive Retrieval: an Explorative Study (Vol. 176). Master's thesis, , .
|
Anastasios Doulamis, Nikolaos Doulamis, Marco Bertini, Jordi Gonzalez, & Thomas B. Moeslund. (2013). Analysis and Retrieval of Tracked Events and Motion in Imagery Streams.
|