|
Zeynep Yucel, Albert Ali Salah, Çetin Meriçli, Tekin Meriçli, Roberto Valenti, & Theo Gevers. (2013). Joint Attention by Gaze Interpolation and Saliency. T-CIBER - IEEE Transactions on cybernetics, 829–842.
Abstract: Joint attention, which is the ability of coordination of a common point of reference with the communicating party, emerges as a key factor in various interaction scenarios. This paper presents an image-based method for establishing joint attention between an experimenter and a robot. The precise analysis of the experimenter's eye region requires stability and high-resolution image acquisition, which is not always available. We investigate regression-based interpolation of the gaze direction from the head pose of the experimenter, which is easier to track. Gaussian process regression and neural networks are contrasted to interpolate the gaze direction. Then, we combine gaze interpolation with image-based saliency to improve the target point estimates and test three different saliency schemes. We demonstrate the proposed method on a human-robot interaction scenario. Cross-subject evaluations, as well as experiments under adverse conditions (such as dimmed or artificial illumination or motion blur), show that our method generalizes well and achieves rapid gaze estimation for establishing joint attention.
|
|
|
Yainuvis Socarras, Sebastian Ramos, David Vazquez, Antonio Lopez, & Theo Gevers. (2013). Adapting Pedestrian Detection from Synthetic to Far Infrared Images. In ICCV Workshop on Visual Domain Adaptation and Dataset Bias. Sydney, Australy.
Abstract: We present different techniques to adapt a pedestrian classifier trained with synthetic images and the corresponding automatically generated annotations to operate with far infrared (FIR) images. The information contained in this kind of images allow us to develop a robust pedestrian detector invariant to extreme illumination changes.
Keywords: Domain Adaptation; Far Infrared; Pedestrian Detection
|
|
|
Xavier Baro, David Masip, Elena Planas, & Julia Minguillon. (2013). PeLP: Plataforma para el Aprendizaje de Lenguajes de Programación.
|
|
|
Wenjuan Gong. (2013). 3D Motion Data aided Human Action Recognition and Pose Estimation (Jordi Gonzalez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In this work, we explore human action recognition and pose estimation prob-
lems. Different from traditional works of learning from 2D images or video
sequences and their annotated output, we seek to solve the problems with ad-
ditional 3D motion capture information, which helps to fill the gap between 2D
image features and human interpretations.
We first compare two different schools of approaches commonly used for 3D
pose estimation from 2D pose configuration: modeling and learning methods.
By looking into experiments results and considering our problems, we fixed a
learning method as the following approaches to do pose estimation. We then
establish a framework by adding a module of detecting 2D pose configuration
from images with varied background, which widely extend the application of
the approach. We also seek to directly estimate 3D poses from image features,
instead of estimating 2D poses as a intermediate module. We explore a robust
input feature, which combined with the proposed distance measure, provides
a solution for noisy or corrupted inputs. We further utilize the above method
to estimate weak poses,which is a concise representation of the original poses
by using dimension deduction technologies, from image features. Weak pose
space is where we calculate vocabulary and label action types using a bog of
words pipeline. Temporal information of an action is taken into consideration by
considering several consecutive frames as a single unit for computing vocabulary
and histogram assignments.
|
|
|
Volkmar Frinken, Andreas Fischer, & Carlos David Martinez Hinarejos. (2013). Handwriting Recognition in Historical Documents using Very Large Vocabularies. In 2nd International Workshop on Historical Document Imaging and Processing (pp. 67–72).
Abstract: Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words.
|
|
|
Vitaliy Konovalov, Albert Clapes, & Sergio Escalera. (2013). Automatic Hand Detection in RGB-Depth Data Sequences. In 16th Catalan Conference on Artificial Intelligence (pp. 91–100). LNCS.
Abstract: Detecting hands in multi-modal RGB-Depth visual data has become a challenging Computer Vision problem with several applications of interest. This task involves dealing with changes in illumination, viewpoint variations, the articulated nature of the human body, the high flexibility of the wrist articulation, and the deformability of the hand itself. In this work, we propose an accurate and efficient automatic hand detection scheme to be applied in Human-Computer Interaction (HCI) applications in which the user is seated at the desk and, thus, only the upper body is visible. Our main hypothesis is that hand landmarks remain at a nearly constant geodesic distance from an automatically located anatomical reference point.
In a given frame, the human body is segmented first in the depth image. Then, a
graph representation of the body is built in which the geodesic paths are computed from the reference point. The dense optical flow vectors on the corresponding RGB image are used to reduce ambiguities of the geodesic paths’ connectivity, allowing to eliminate false edges interconnecting different body parts. Finally, we are able to detect the position of both hands based on invariant geodesic distances and optical flow within the body region, without involving costly learning procedures.
|
|
|
Victor Ponce, Sergio Escalera, & Xavier Baro. (2013). Multi-modal Social Signal Analysis for Predicting Agreement in Conversation Settings. In 15th ACM International Conference on Multimodal Interaction (pp. 495–502).
Abstract: In this paper we present a non-invasive ambient intelligence framework for the analysis of non-verbal communication applied to conversational settings. In particular, we apply feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues coming from the fields of psychology and observational methodology. We test our methodology over data captured in victim-offender mediation scenarios. Using different state-of-the-art classification approaches, our system achieve upon 75% of recognition predicting agreement among the parts involved in the conversations, using as ground truth the experts opinions.
|
|
|
Victor Borjas, Jordi Vitria, & Petia Radeva. (2013). Gradient Histogram Background Modeling for People Detection in Stationary Camera Environments. In 13th IAPR Conference on Machine Vision Applications.
Abstract: Best Poster AwardOne of the big challenges of today person detectors is the decreasing of the false positive rate. In this paper, we propose a novel framework to customize person detectors in static camera scenarios in order to reduce this rate. This scheme includes background modeling for subtraction based on gradient histograms and Mean-Shift clustering. Our experiments show that the detection improved compared to using only the output from the pedestrian detector reducing 87% of the false positives and therefore the overall precision of the detection
was increased signicantly.
|
|
|
Veronica Romero, Alicia Fornes, Nicolas Serrano, Joan Andreu Sanchez, A.H. Toselli, Volkmar Frinken, et al. (2013). The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. PR - Pattern Recognition, 46(6), 1658–1669.
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demography studies and genealogical research. Automatic processing of historical documents, however, has mostly been focused on single works of literature and less on social records, which tend to have a distinct layout, structure, and vocabulary. Such information is usually collected by expert demographers that devote a lot of time to manually transcribe them. This paper presents a new database, compiled from a marriage license books collection, to support research in automatic handwriting recognition for historical documents containing social records. Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. Books from this collection are handwritten and span nearly half a millennium until the beginning of the 20th century. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems, when applied to the presented database. Baseline results are reported for reference in future studies.
|
|
|
V.C.Kieu, Alicia Fornes, M. Visani, N.Journet, & Anjan Dutta. (2013). The ICDAR/GREC 2013 Music Scores Competition on Staff Removal. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we propose a staff removal competition where we simulate old music scores. Thus, we have created a new set of images, which contain noise and 3D distortions. This paper describes the distortion methods, metrics, the participant’s methods and the obtained results.
Keywords: Competition; Music scores; Staff Removal
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2013). New Approach for Symbol Recognition Combining Shape Context of Interest Points with Sparse Representation. In 12th International Conference on Document Analysis and Recognition (pp. 265–269).
Abstract: In this paper, we propose a new approach for symbol description. Our method is built based on the combination of shape context of interest points descriptor and sparse representation. More specifically, we first learn a dictionary describing shape context of interest point descriptors. Then, based on information retrieval techniques, we build a vector model for each symbol based on its sparse representation in a visual vocabulary whose visual words are columns in the learneddictionary. The retrieval task is performed by ranking symbols based on similarity between vector models. Evaluation of our method, using benchmark datasets, demonstrates the validity of our approach and shows that it outperforms related state-of-theart methods.
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2013). Document noise removal using sparse representations over learned dictionary. In Symposium on Document engineering (pp. 161–168).
Abstract: best paper award
In this paper, we propose an algorithm for denoising document images using sparse representations. Following a training set, this algorithm is able to learn the main document characteristics and also, the kind of noise included into the documents. In this perspective, we propose to model the noise energy based on the normalized cross-correlation between pairs of noisy and non-noisy documents. Experimental
results on several datasets demonstrate the robustness of our method compared with the state-of-the-art.
|
|
|
Shida Beigpour, Marc Serra, Joost Van de Weijer, Robert Benavente, Maria Vanrell, Olivier Penacchio, et al. (2013). Intrinsic Image Evaluation On Synthetic Complex Scenes. In 20th IEEE International Conference on Image Processing (pp. 285–289).
Abstract: Scene decomposition into its illuminant, shading, and reflectance intrinsic images is an essential step for scene understanding. Collecting intrinsic image groundtruth data is a laborious task. The assumptions on which the ground-truth
procedures are based limit their application to simple scenes with a single object taken in the absence of indirect lighting and interreflections. We investigate synthetic data for intrinsic image research since the extraction of ground truth is straightforward, and it allows for scenes in more realistic situations (e.g, multiple illuminants and interreflections). With this dataset we aim to motivate researchers to further explore intrinsic image decomposition in complex scenes.
|
|
|
Shida Beigpour. (2013). Illumination and object reflectance modeling (Joost Van de Weijer, & Ernest Valveny, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: More realistic and accurate models of the scene illumination and object reflectance can greatly improve the quality of many computer vision and computer graphics tasks. Using such model, a more profound knowledge about the interaction of light with object surfaces can be established which proves crucial to a variety of computer vision applications. In the current work, we investigate the various existing approaches to illumination and reflectance modeling and form an analysis on their shortcomings in capturing the complexity of real-world scenes. Based on this analysis we propose improvements to different aspects of reflectance and illumination estimation in order to more realistically model the real-world scenes in the presence of complex lighting phenomena (i.e, multiple illuminants, interreflections and shadows). Moreover, we captured our own multi-illuminant dataset which consists of complex scenes and illumination conditions both outdoor and in laboratory conditions. In addition we investigate the use of synthetic data to facilitate the construction of datasets and improve the process of obtaining ground-truth information.
|
|
|
Sezer Karaoglu, Jan van Gemert, & Theo Gevers. (2013). Con-text: text detection using background connectivity for fine-grained object classification. In 21ST ACM International Conference on Multimedia (pp. 757–760).
|
|