|
Ernest Valveny, & Enric Marti. (1999). Application of deformable template matching to symbol recognition in hand-written architectural draw. In Proceedings of the Fifth International Conference on. Bangalore (India).
Abstract: We propose to use deformable template matching as a new approach to recognize characters and lineal symbols in hand-written line drawings, instead of traditional methods based on vectorization and feature extraction. Bayesian formulation of the deformable template matching allows combining fidelity to the ideal shape of the symbol with maximum flexibility to get the best fit to the input image. Lineal nature of symbols can be exploited to define a suitable representation of models and the set of deformations to be applied to them. Matching, however, is done over the original binary image to avoid losing relevant features during vectorization. We have applied this method to hand-written architectural drawings and experimental results demonstrate that symbols with high distortions from ideal shape can be accurately identified.
|
|
|
Ruth Aylett, Ginevra Castellano, Bogdan Raducanu, Ana Paiva, & Marc Hanheide. (2011). Long-term socially perceptive and interactive robot companions: challenges and future perspectives. In 13th International Conference on Multimodal Interaction (pp. 323–326). ACM.
Abstract: This paper gives a brief overview of the challenges for multi-model perception and generation applied to robot companions located in human social environments. It reviews the current position in both perception and generation and the immediate technical challenges and goes on to consider the extra issues raised by embodiment and social context. Finally, it briefly discusses the impact of systems that must function continually over months rather than just for a few hours.
Keywords: human-robot interaction, multimodal interaction, social robotics
|
|
|
Cesar Isaza, Joaquin Salas, & Bogdan Raducanu. (2012). Synthetic ground truth dataset to detect shadow cast by static objects in outdoor. In 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications (art. 11). ACM.
Abstract: In this paper, we propose a precise synthetic ground truth dataset to study the problem of detection of the shadows cast by static objects in outdoor environments during extended periods of time (days). For our dataset, we have created a virtual scenario using a rendering software. To increase the realism of the simulated environment, we have defined the scenario in a precise geographical location. In our dataset the sun is by far the main illumination source. The sun position during the simulation time takes into consideration factors related to the geographical location, such as the latitude, longitude, elevation above sea level, and precise image capturing day and time. In our simulation the camera remains fixed. The dataset consists of seven days of simulation, from 10:00am to 5:00pm. Images are captured every 10 seconds. The shadows' ground truth is automatically computed by the rendering software.
|
|
|
Miguel Angel Bautista, Oriol Pujol, Xavier Baro, & Sergio Escalera. (2011). Introducing the Separability Matrix for Error Correcting Output Codes Coding. In Carlo Sansone, Josef Kittler, & Fabio Roli (Eds.), 10th International conference on Multiple Classifier Systems (Vol. 6713, pp. 227–236). LNCS. Springer-Verlag Berlin Heidelberg.
Abstract: Error Correcting Output Codes (ECOC) have demonstrate to be a powerful tool for treating multi-class problems. Nevertheless, predefined ECOC designs may not benefit from Error-correcting principles for particular multi-class data. In this paper, we introduce the Separability matrix as a tool to study and enhance designs for ECOC coding. In addition, a novel problem-dependent coding design based on the Separability matrix is tested over a wide set of challenging multi-class problems, obtaining very satisfactory results.
|
|
|
Miguel Angel Bautista, Oriol Pujol, Xavier Baro, & Sergio Escalera. (2011). Introducing the Separability Matrix for Error Correcting Output Codes Coding. In Carlo Sansone, Josef Kittler, & Fabio Roli (Eds.), 10th International Conference on Multiple Classifier Systems (Vol. 6713, pp. 227–236). LNCS. Springer-Verlag Berlin, Heidelberg.
Abstract: Error Correcting Output Codes (ECOC) have demonstrate to be a powerful tool for treating multi-class problems. Nevertheless, predefined ECOC designs may not benefit from Error-correcting principles for particular multi-class data. In this paper, we introduce the Separability matrix as a tool to study and enhance designs for ECOC coding. In addition, a novel problem-dependent coding design based on the Separability matrix is tested over a wide set of challenging multi-class problems, obtaining very satisfactory results.
|
|
|
Maria Vanrell, Naila Murray, Robert Benavente, C. Alejandro Parraga, Xavier Otazu, & Ramon Baldrich. (2011). Perception Based Representations for Computational Colour. In Alain Trémeau S. T. Raimondo Schettini (Ed.), 3rd International Workshop on Computational Color Imaging (Vol. 6626, pp. 16–30). LNCS. Springer-Verlag.
Abstract: The perceived colour of a stimulus is dependent on multiple factors stemming out either from the context of the stimulus or idiosyncrasies of the observer. The complexity involved in combining these multiple effects is the main reason for the gap between classical calibrated colour spaces from colour science and colour representations used in computer vision, where colour is just one more visual cue immersed in a digital image where surfaces, shadows and illuminants interact seemingly out of control. With the aim to advance a few steps towards bridging this gap we present some results on computational representations of colour for computer vision. They have been developed by introducing perceptual considerations derived from the interaction of the colour of a point with its context. We show some techniques to represent the colour of a point influenced by assimilation and contrast effects due to the image surround and we show some results on how colour saliency can be derived in real images. We outline a model for automatic assignment of colour names to image points directly trained on psychophysical data. We show how colour segments can be perceptually grouped in the image by imposing shading coherence in the colour space.
Keywords: colour perception, induction, naming, psychophysical data, saliency, segmentation
|
|
|
Sergio Escalera, Alicia Fornes, Oriol Pujol, Alberto Escudero, & Petia Radeva. (2009). Circular Blurred Shape Model for Symbol Spotting in Documents. In 16th IEEE International Conference on Image Processing (pp. 1985–1988).
Abstract: Symbol spotting problem requires feature extraction strategies able to generalize from training samples and to localize the target object while discarding most part of the image. In the case of document analysis, symbol spotting techniques have to deal with a high variability of symbols' appearance. In this paper, we propose the Circular Blurred Shape Model descriptor. Feature extraction is performed capturing the spatial arrangement of significant object characteristics in a correlogram structure. Shape information from objects is shared among correlogram regions, being tolerant to the irregular deformations. Descriptors are learnt using a cascade of classifiers and Abadoost as the base classifier. Finally, symbol spotting is performed by means of a windowing strategy using the learnt cascade over plan and old musical score documents. Spotting and multi-class categorization results show better performance comparing with the state-of-the-art descriptors.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). Error-Correcting Output Codes Library. JMLR - Journal of Machine Learning Research, 11, 661–664.
Abstract: (Feb):661−664
In this paper, we present an open source Error-Correcting Output Codes (ECOC) library. The ECOC framework is a powerful tool to deal with multi-class categorization problems. This library contains both state-of-the-art coding (one-versus-one, one-versus-all, dense random, sparse random, DECOC, forest-ECOC, and ECOC-ONE) and decoding designs (hamming, euclidean, inverse hamming, laplacian, β-density, attenuated, loss-based, probabilistic kernel-based, and loss-weighted) with the parameters defined by the authors, as well as the option to include your own coding, decoding, and base classifier.
|
|
|
Xavier Baro, Sergio Escalera, Petia Radeva, & Jordi Vitria. (2009). Visual Content Layer for Scalable Recognition in Urban Image Databases, Internet Multimedia Search and Mining. In 10th IEEE International Conference on Multimedia and Expo (1616–1619).
Abstract: Rich online map interaction represents a useful tool to get multimedia information related to physical places. With this type of systems, users can automatically compute the optimal route for a trip or to look for entertainment places or hotels near their actual position. Standard maps are defined as a fusion of layers, where each one contains specific data such height, streets, or a particular business location. In this paper we propose the construction of a visual content layer which describes the visual appearance of geographic locations in a city. We captured, by means of a Mobile Mapping system, a huge set of georeferenced images (> 500K) which cover the whole city of Barcelona. For each image, hundreds of region descriptions are computed off-line and described as a hash code. This allows an efficient and scalable way of accessing maps by visual content.
|
|
|
Partha Pratim Roy, Umapada Pal, & Josep Llados. (2010). Seal Object Detection in Document Images using GHT of Local Component Shapes. In 10th ACM Symposium On Applied Computing (23–27).
Abstract: Due to noise, overlapped text/signature and multi-oriented nature, seal (stamp) object detection involves a difficult challenge. This paper deals with automatic detection of seal from documents with cluttered background. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors (distance and angular position) computed from recognition result of individual connected components (characters). Recognition of multi-scale and multi-oriented component is done using Support Vector Machine classifier. Generalized Hough Transform (GHT) is used to detect the seal and a voting is casted for finding possible location of the seal object in a document based on these spatial feature descriptor of components pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal object in a document. Experimental results show that, the method is efficient to locate seal instance of arbitrary shape and orientation in documents.
|
|
|
Sergio Escalera, Petia Radeva, Jordi Vitria, Xavier Baro, & Bogdan Raducanu. (2010). Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks. In 12th International Conference on Multimodal Interfaces and 7th Workshop on Machine Learning for Multimodal Interaction..
Abstract: Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from
multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters
are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented
mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states
encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times’ Blogging Heads opinion blog. The results
are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.
Keywords: Social interaction; Multimodal fusion, Influence model; Social network analysis
|
|
|
Fadi Dornaika, & Angel Sappa. (2009). A Featureless and Stochastic Approach to On-board Stereo Vision System Pose. IMAVIS - Image and Vision Computing, 27(9), 1382–1393.
Abstract: This paper presents a direct and stochastic technique for real-time estimation of on-board stereo head’s position and orientation. Unlike existing works which rely on feature extraction either in the image domain or in 3D space, our proposed approach directly estimates the unknown parameters from the stream of stereo pairs’ brightness. The pose parameters are tracked using the particle filtering framework which implicitly enforces the smoothness constraints on the estimated parameters. The proposed technique can be used with a driver assistance applications as well as with augmented reality applications. Extended experiments on urban environments with different road geometries are presented. Comparisons with a 3D data-based approach are presented. Moreover, we provide a performance study aiming at evaluating the accuracy of the proposed approach.
Keywords: On-board stereo vision system; Pose estimation; Featureless approach; Particle filtering; Image warping
|
|
|
R. Valenti, N. Sebe, & Theo Gevers. (2012). What are you looking at? Improving Visual gaze Estimation by Saliency. IJCV - International Journal of Computer Vision, 98(3), 324–334.
Abstract: Impact factor 2010: 5.15
Impact factor 2011/12?: 5.36
In this paper we present a novel mechanism to obtain enhanced gaze estimation for subjects looking at a scene or an image. The system makes use of prior knowledge about the scene (e.g. an image on a computer screen), to define a probability map of the scene the subject is gazing at, in order to find the most probable location. The proposed system helps in correcting the fixations which are erroneously estimated by the gaze estimation device by employing a saliency framework to adjust the resulting gaze point vector. The system is tested on three scenarios: using eye tracking data, enhancing a low accuracy webcam based eye tracker, and using a head pose tracker. The correlation between the subjects in the commercial eye tracking data is improved by an average of 13.91%. The correlation on the low accuracy eye gaze tracker is improved by 59.85%, and for the head pose tracker we obtain an improvement of 10.23%. These results show the potential of the system as a way to enhance and self-calibrate different visual gaze estimation systems.
|
|
|
Pedro Martins, Carlo Gatta, & Paulo Carvalho. (2012). Feature-driven Maximally Stable Extremal Regions. In 7th International Conference on Computer Vision Theory and Applications (pp. 490–497).
|
|
|
Joost Van de Weijer, & Shida Beigpour. (2011). The Dichromatic Reflection Model: Future Research Directions and Applications. In José L. and B. Mestetskiy (Ed.), International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. SciTePress.
Abstract: The dichromatic reflection model (DRM) predicts that color distributions form a parallelogram in color space, whose shape is defined by the body reflectance and the illuminant color. In this paper we resume the assumptions which led to the DRM and shortly recall two of its main applications domains: color image segmentation and photometric invariant feature computation. After having introduced the model we discuss several limitations of the theory, especially those which are raised once working on real-world uncalibrated images. In addition, we summerize recent extensions of the model which allow to handle more complicated light interactions. Finally, we suggest some future research directions which would further extend its applicability.
Keywords: dblp
|
|