|
Albert Gordo, & Ernest Valveny. (2009). A rotation invariant page layout descriptor for document classification and retrieval. In 10th International Conference on Document Analysis and Recognition (481–485).
Abstract: Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez, & Horst Bunke. (2009). On the use of textural features for writer identification in old handwritten music scores. In 10th International Conference on Document Analysis and Recognition (pp. 996–1000).
Abstract: Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
|
|
|
D. Perez, L. Tarazon, N. Serrano, F.M. Castro, Oriol Ramos Terrades, & A. Juan. (2009). The GERMANA Database. In 10th International Conference on Document Analysis and Recognition (pp. 301–305).
Abstract: A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
|
|
|
Ivan Huerta, Michael Holte, Thomas B. Moeslund, & Jordi Gonzalez. (2009). Detection and Removal of Chromatic Moving Shadows in Surveillance Scenarios. In 12th International Conference on Computer Vision (pp. 1499–1506).
Abstract: Segmentation in the surveillance domain has to deal with shadows to avoid distortions when detecting moving objects. Most segmentation approaches dealing with shadow detection are typically restricted to penumbra shadows. Therefore, such techniques cannot cope well with umbra shadows. Consequently, umbra shadows are usually detected as part of moving objects. In this paper we present a novel technique based on gradient and colour models for separating chromatic moving cast shadows from detected moving objects. Firstly, both a chromatic invariant colour cone model and an invariant gradient model are built to perform automatic segmentation while detecting potential shadows. In a second step, regions corresponding to potential shadows are grouped by considering “a bluish effect” and an edge partitioning. Lastly, (i) temporal similarities between textures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for all potential shadow regions in order to finally identify umbra shadows. Unlike other approaches, our method does not make any a-priori assumptions about camera location, surface geometries, surface textures, shapes and types of shadows, objects, and background. Experimental results show the performance and accuracy of our approach in different shadowed materials and illumination conditions.
|
|
|
Bogdan Raducanu, Jordi Vitria, & D. Gatica-Perez. (2009). You are Fired! Nonverbal Role Analysis in Competitive Meetings. In IEEE International Conference on Audio, Speech and Signal Processing (1949–1952).
Abstract: This paper addresses the problem of social interaction analysis in competitive meetings, using nonverbal cues. For our study, we made use of ldquoThe Apprenticerdquo reality TV show, which features a competition for a real, highly paid corporate job. Our analysis is centered around two tasks regarding a person's role in a meeting: predicting the person with the highest status and predicting the fired candidates. The current study was carried out using nonverbal audio cues. Results obtained from the analysis of a full season of the show, representing around 90 minutes of audio data, are very promising (up to 85.7% of accuracy in the first case and up to 92.8% in the second case). Our approach is based only on the nonverbal interaction dynamics during the meeting without relying on the spoken words.
|
|
|
Sergio Escalera, Eloi Puertas, Petia Radeva, & Oriol Pujol. (2009). Multimodal laughter recognition in video conversations. In 2nd IEEE Workshop on CVPR for Human communicative Behavior analysis (110–115).
Abstract: Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier. Finally, the multi-modal cues are included in a sequential classifier. Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier. Moreover, the sequential methodology shows to significantly outperform the results obtained by an Adaboost classifier.
|
|
|
Sergio Escalera, R. M. Martinez, Jordi Vitria, Petia Radeva, & Maria Teresa Anguera. (2009). Dominance Detection in Face-to-face Conversations. In 2nd IEEE Workshop on CVPR for Human communicative Behavior analysis (97–102).
Abstract: Dominance is referred to the level of influence a person has in a conversation. Dominance is an important research area in social psychology, but the problem of its automatic estimation is a very recent topic in the contexts of social and wearable computing. In this paper, we focus on dominance detection from visual cues. We estimate the correlation among observers by categorizing the dominant people in a set of face-to-face conversations. Different dominance indicators from gestural communication are defined, manually annotated, and compared to the observers opinion. Moreover, the considered indicators are automatically extracted from video sequences and learnt by using binary classifiers. Results from the three analysis shows a high correlation and allows the categorization of dominant people in public discussion video sequences.
|
|
|
Jose Manuel Alvarez, Theo Gevers, & Antonio Lopez. (2009). Learning Photometric Invariance from Diversified Color Model Ensembles. In 22nd IEEE Conference on Computer Vision and Pattern Recognition (565–572).
Abstract: Color is a powerful visual cue for many computer vision applications such as image segmentation and object recognition. However, most of the existing color models depend on the imaging conditions affecting negatively the performance of the task at hand. Often, a reflection model (e.g., Lambertian or dichromatic reflectance) is used to derive color invariant models. However, those reflection models might be too restricted to model real-world scenes in which different reflectance mechanisms may hold simultaneously. Therefore, in this paper, we aim to derive color invariance by learning from color models to obtain diversified color invariant ensembles. First, a photometrical orthogonal and non-redundant color model set is taken on input composed of both color variants and invariants. Then, the proposed method combines and weights these color models to arrive at a diversified color ensemble yielding a proper balance between invariance (repeatability) and discriminative power (distinctiveness). To achieve this, the fusion method uses a multi-view approach to minimize the estimation error. In this way, the method is robust to data uncertainty and produces properly diversified color invariant ensembles. Experiments are conducted on three different image datasets to validate the method. From the theoretical and experimental results, it is concluded that the method is robust against severe variations in imaging conditions. The method is not restricted to a certain reflection model or parameter tuning. Further, the method outperforms state-of- the-art detection techniques in the field of object, skin and road recognition.
Keywords: road detection
|
|
|
Miquel Ferrer, Ernest Valveny, & F. Serratosa. (2009). Median graph: A new exact algorithm using a distance based on the maximum common subgraph. PRL - Pattern Recognition Letters, 30(5), 579–588.
Abstract: Median graphs have been presented as a useful tool for capturing the essential information of a set of graphs. Nevertheless, computation of optimal solutions is a very hard problem. In this work we present a new and more efficient optimal algorithm for the median graph computation. With the use of a particular cost function that permits the definition of the graph edit distance in terms of the maximum common subgraph, and a prediction function in the backtracking algorithm, we reduce the size of the search space, avoiding the evaluation of a great amount of states and still obtaining the exact median. We present a set of experiments comparing our new algorithm against the previous existing exact algorithm using synthetic data. In addition, we present the first application of the exact median graph computation to real data and we compare the results against an approximate algorithm based on genetic search. These experimental results show that our algorithm outperforms the previous existing exact algorithm and in addition show the potential applicability of the exact solutions to real problems.
|
|
|
Fadi Dornaika, & Angel Sappa. (2009). Instantaneous 3D motion from image derivatives using the Least Trimmed Square Regression. PRL - Pattern Recognition Letters, 30(5), 535–543.
Abstract: This paper presents a new technique to the instantaneous 3D motion estimation. The main contributions are as follows. First, we show that the 3D camera or scene velocity can be retrieved from image derivatives only assuming that the scene contains a dominant plane. Second, we propose a new robust algorithm that simultaneously provides the Least Trimmed Square solution and the percentage of inliers-the non-contaminated data. Experiments on both synthetic and real image sequences demonstrated the effectiveness of the developed method. Those experiments show that the new robust approach can outperform classical robust schemes.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2009). Separability of Ternary Codes for Sparse Designs of Error-Correcting Output Codes. PRL - Pattern Recognition Letters, 30(3), 285–297.
Abstract: Error Correcting Output Codes (ECOC) represent a successful framework to deal with multi-class categorization problems based on combining binary classifiers. In this paper, we present a new formulation of the ternary ECOC distance and the error-correcting capabilities in the ternary ECOC framework. Based on the new measure, we stress on how to design coding matrices preventing codification ambiguity and propose a new Sparse Random coding matrix with ternary distance maximization. The results on the UCI Repository and in a real speed traffic categorization problem show that when the coding design satisfies the new ternary measures, significant performance improvement is obtained independently of the decoding strategy applied.
|
|
|
Ignasi Rius, Jordi Gonzalez, Javier Varona, & Xavier Roca. (2009). Action-specific motion prior for efficient bayesian 3D human body tracking. PR - Pattern Recognition, 42(11), 2907–2921.
Abstract: In this paper, we aim to reconstruct the 3D motion parameters of a human body
model from the known 2D positions of a reduced set of joints in the image plane.
Towards this end, an action-specific motion model is trained from a database of real
motion-captured performances. The learnt motion model is used within a particle
filtering framework as a priori knowledge on human motion. First, our dynamic
model guides the particles according to similar situations previously learnt. Then, the solution space is constrained so only feasible human postures are accepted as valid solutions at each time step. As a result, we are able to track the 3D configuration of the full human body from several cycles of walking motion sequences using only the 2D positions of a very reduced set of joints from lateral or frontal viewpoints.
|
|
|
Jose Antonio Rodriguez, & Florent Perronnin. (2009). Handwritten word-spotting using hidden Markov models and universal vocabularies. PR - Pattern Recognition, 42(9), 2103–2116.
Abstract: Handwritten word-spotting is traditionally viewed as an image matching task between one or multiple query word-images and a set of candidate word-images in a database. This is a typical instance of the query-by-example paradigm. In this article, we introduce a statistical framework for the word-spotting problem which employs hidden Markov models (HMMs) to model keywords and a Gaussian mixture model (GMM) for score normalization. We explore the use of two types of HMMs for the word modeling part: continuous HMMs (C-HMMs) and semi-continuous HMMs (SC-HMMs), i.e. HMMs with a shared set of Gaussians. We show on a challenging multi-writer corpus that the proposed statistical framework is always superior to a traditional matching system which uses dynamic time warping (DTW) for word-image distance computation. A very important finding is that the SC-HMM is superior when labeled training data is scarce—as low as one sample per keyword—thanks to the prior information which can be incorporated in the shared set of Gaussians.
Keywords: Word-spotting; Hidden Markov model; Score normalization; Universal vocabulary; Handwriting recognition
|
|
|
Misael Rosales, Petia Radeva, Oriol Rodriguez-Leor, & Debora Gil. (2009). Modelling of image-catheter motion for 3-D IVUS. MIA - Medical image analysis, 13(1), 91–104.
Abstract: Three-dimensional intravascular ultrasound (IVUS) allows to visualize and obtain volumetric measurements of coronary lesions through an exploration of the cross sections and longitudinal views of arteries. However, the visualization and subsequent morpho-geometric measurements in IVUS longitudinal cuts are subject to distortion caused by periodic image/vessel motion around the IVUS catheter. Usually, to overcome the image motion artifact ECG-gating and image-gated approaches are proposed, leading to slowing the pullback acquisition or disregarding part of IVUS data. In this paper, we argue that the image motion is due to 3-D vessel geometry as well as cardiac dynamics, and propose a dynamic model based on the tracking of an elliptical vessel approximation to recover the rigid transformation and align IVUS images without loosing any IVUS data. We report an extensive validation with synthetic simulated data and in vivo IVUS sequences of 30 patients achieving an average reduction of the image artifact of 97% in synthetic data and 79% in real-data. Our study shows that IVUS alignment improves longitudinal analysis of the IVUS data and is a necessary step towards accurate reconstruction and volumetric measurements of 3-D IVUS.
Keywords: Intravascular ultrasound (IVUS); Motion estimation; Motion decomposition; Fourier
|
|
|
Jordi Gonzalez, Dani Rowe, Javier Varona, & Xavier Roca. (2009). Understanding Dynamic Scenes based on Human Sequence Evaluation. IMAVIS - Image and Vision Computing, 27(10), 1433–1444.
Abstract: In this paper, a Cognitive Vision System (CVS) is presented, which explains the human behaviour of monitored scenes using natural-language texts. This cognitive analysis of human movements recorded in image sequences is here referred to as Human Sequence Evaluation (HSE) which defines a set of transformation modules involved in the automatic generation of semantic descriptions from pixel values. In essence, the trajectories of human agents are obtained to generate textual interpretations of their motion, and also to infer the conceptual relationships of each agent w.r.t. its environment. For this purpose, a human behaviour model based on Situation Graph Trees (SGTs) is considered, which permits both bottom-up (hypothesis generation) and top-down (hypothesis refinement) analysis of dynamic scenes. The resulting system prototype interprets different kinds of behaviour and reports textual descriptions in multiple languages.
Keywords: Image Sequence Evaluation; High-level processing of monitored scenes; Segmentation and tracking in complex scenes; Event recognition in dynamic scenes; Human motion understanding; Human behaviour interpretation; Natural-language text generation; Realistic demonstrators
|
|