|
Alicia Fornes, & Gemma Sanchez. (2014). Analysis and Recognition of Music Scores. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. E, pp. 749–774). Springer London.
Abstract: The analysis and recognition of music scores has attracted the interest of researchers for decades. Optical Music Recognition (OMR) is a classical research field of Document Image Analysis and Recognition (DIAR), whose aim is to extract information from music scores. Music scores contain both graphical and textual information, and for this reason, techniques are closely related to graphics recognition and text recognition. Since music scores use a particular diagrammatic notation that follow the rules of music theory, many approaches make use of context information to guide the recognition and solve ambiguities. This chapter overviews the main Optical Music Recognition (OMR) approaches. Firstly, the different methods are grouped according to the OMR stages, namely, staff removal, music symbol recognition, and syntactical analysis. Secondly, specific approaches for old and handwritten music scores are reviewed. Finally, online approaches and commercial systems are also commented.
|
|
|
Lluis Pere de las Heras, Ernest Valveny, & Gemma Sanchez. (2014). Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies. In Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 109–121). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a wall segmentation approach in floor plans that is able to work independently to the graphical notation, does not need any pre-annotated data for learning, and is able to segment multiple-shaped walls such as beams and curved-walls. This method results from the combination of the wall segmentation approaches [3, 5] presented recently by the authors. Firstly, potential straight wall segments are extracted in an unsupervised way similar to [3], but restricting even more the wall candidates considered in the original approach. Then, based on [5], these segments are used to learn the texture pattern of walls and spot the lost instances. The presented combination of both methods has been tested on 4 available datasets with different notations and compared qualitatively and quantitatively to the state-of-the-art applied on these collections. Additionally, some qualitative results on floor plans directly downloaded from the Internet are reported in the paper. The overall performance of the method demonstrates either its adaptability to different wall notations and shapes, and to document qualities and resolutions.
Keywords: Graphics recognition; Floor plan analysis; Object segmentation
|
|
|
Marçal Rusiñol, Volkmar Frinken, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2014). Multimodal page classification in administrative document image streams. IJDAR - International Journal on Document Analysis and Recognition, 17(4), 331–341.
Abstract: In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
Keywords: Digital mail room; Multimodal page classification; Visual and textual document description
|
|
|
Xavier Perez Sala, Sergio Escalera, Cecilio Angulo, & Jordi Gonzalez. (2014). A survey on model based approaches for 2D and 3D visual human pose recovery. SENS - Sensors, 14(3), 4189–4210.
Abstract: Human Pose Recovery has been studied in the field of Computer Vision for the last 40 years. Several approaches have been reported, and significant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we define a general taxonomy to group model based approaches for Human Pose Recovery, which is composed of five main modules: appearance, viewpoint, spatial relations, temporal consistence, and behavior. Subsequently, a methodological comparison is performed following the proposed taxonomy, evaluating current SoA approaches in the aforementioned five group categories. As a result of this comparison, we discuss the main advantages and drawbacks of the reviewed literature.
Keywords: human pose recovery; human body modelling; behavior analysis; computer vision
|
|
|
Sergio Escalera, Xavier Baro, Jordi Gonzalez, Miguel Angel Bautista, Meysam Madadi, Miguel Reyes, et al. (2014). ChaLearn Looking at People Challenge 2014: Dataset and Results. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 459–473). LNCS.
Abstract: This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively.
Keywords: Human Pose Recovery; Behavior Analysis; Action and in- teractions; Multi-modal gestures; recognition
|
|
|
Oscar Lopes, Miguel Reyes, Sergio Escalera, & Jordi Gonzalez. (2014). Spherical Blurred Shape Model for 3-D Object and Pose Recognition: Quantitative Analysis and HCI Applications in Smart Environments. TSMCB - IEEE Transactions on Systems, Man and Cybernetics (Part B), 44(12), 2379–2390.
Abstract: The use of depth maps is of increasing interest after the advent of cheap multisensor devices based on structured light, such as Kinect. In this context, there is a strong need of powerful 3-D shape descriptors able to generate rich object representations. Although several 3-D descriptors have been already proposed in the literature, the research of discriminative and computationally efficient descriptors is still an open issue. In this paper, we propose a novel point cloud descriptor called spherical blurred shape model (SBSM) that successfully encodes the structure density and local variabilities of an object based on shape voxel distances and a neighborhood propagation strategy. The proposed SBSM is proven to be rotation and scale invariant, robust to noise and occlusions, highly discriminative for multiple categories of complex objects like the human hand, and computationally efficient since the SBSM complexity is linear to the number of object voxels. Experimental evaluation in public depth multiclass object data, 3-D facial expressions data, and a novel hand poses data sets show significant performance improvements in relation to state-of-the-art approaches. Moreover, the effectiveness of the proposal is also proved for object spotting in 3-D scenes and for real-time automatic hand pose recognition in human computer interaction scenarios.
|
|
|
Miguel Angel Bautista, Sergio Escalera, & Oriol Pujol. (2014). On the Design of an ECOC-Compliant Genetic Algorithm. PR - Pattern Recognition, 47(2), 865–884.
Abstract: Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches.
|
|
|
Frederic Sampedro, Anna Domenech, & Sergio Escalera. (2014). Obtaining quantitative global tumoral state indicators based on whole-body PET/CT scans: A breast cancer case study. NMC - Nuclear Medicine Communications, 35(4), 362–371.
Abstract: Objectives: In this work we address the need for the computation of quantitative global tumoral state indicators from oncological whole-body PET/computed tomography scans. The combination of such indicators with other oncological information such as tumor markers or biopsy results would prove useful in oncological decision-making scenarios.
Materials and methods: From an ordering of 100 breast cancer patients on the basis of oncological state through visual analysis by a consensus of nuclear medicine specialists, a set of numerical indicators computed from image analysis of the PET/computed tomography scan is presented, which attempts to summarize a patient’s oncological state in a quantitative manner taking into consideration the total tumor volume, aggressiveness, and spread.
Results: Results obtained by comparative analysis of the proposed indicators with respect to the experts’ evaluation show up to 87% Pearson’s correlation coefficient when providing expert-guided PET metabolic tumor volume segmentation and 64% correlation when using completely automatic image analysis techniques.
Conclusion: Global quantitative tumor information obtained by whole-body PET/CT image analysis can prove useful in clinical nuclear medicine settings and oncological decision-making scenarios. The completely automatic computation of such indicators would improve its impact as time efficiency and specialist independence would be achieved.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2014). Generic Subclass Ensemble: A Novel Approach to Ensemble Classification. In 22nd International Conference on Pattern Recognition (pp. 1254–1259).
Abstract: Multiple classifier systems, also known as classifier ensembles, have received great attention in recent years because of their improved classification accuracy in different applications. In this paper, we propose a new general approach to ensemble classification, named generic subclass ensemble, in which each base classifier is trained with data belonging to a subset of classes, and thus discriminates among a subset of target categories. The ensemble classifiers are then fused using a combination rule. The proposed approach differs from existing methods that manipulate the target attribute, since in our approach individual classification problems are not restricted to two-class problems. We perform a series of experiments to evaluate the efficiency of the generic subclass approach on a set of benchmark datasets. Experimental results with multilayer perceptrons show that the proposed approach presents a viable alternative to the most commonly used ensemble classification approaches.
|
|
|
Mohammad Ali Bagheri, Gang Hu, Qigang Gao, & Sergio Escalera. (2014). A Framework of Multi-Classifier Fusion for Human Action Recognition. In 22nd International Conference on Pattern Recognition (pp. 1260–1265).
Abstract: The performance of different action-recognition methods using skeleton joint locations have been recently studied by several computer vision researchers. However, the potential improvement in classification through classifier fusion by ensemble-based methods has remained unattended. In this work, we evaluate the performance of an ensemble of five action learning techniques, each performing the recognition task from a different perspective. The underlying rationale of the fusion approach is that different learners employ varying structures of input descriptors/features to be trained. These varying structures cannot be attached and used by a single learner. In addition, combining the outputs of several learners can reduce the risk of an unfortunate selection of a poorly performing learner. This leads to having a more robust and general-applicable framework. Also, we propose two simple, yet effective, action description techniques. In order to improve the recognition performance, a powerful combination strategy is utilized based on the Dempster-Shafer theory, which can effectively make use of diversity of base learners trained on different sources of information. The recognition results of the individual classifiers are compared with those obtained from fusing the classifiers' output, showing advanced performance of the proposed methodology.
|
|
|
Antonio Hernandez, Stan Sclaroff, & Sergio Escalera. (2014). Contextual rescoring for Human Pose Estimation. In 25th British Machine Vision Conference.
Abstract: A contextual rescoring method is proposed for improving the detection of body joints of a pictorial structure model for human pose estimation. A set of mid-level parts is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body joint hypotheses. A technique is proposed for the automatic discovery of a compact subset of poselets that covers a set of validation images
while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for body joint detections, given its relationship to detections of other body joints and mid-level parts in the image. This new score complements the unary potential of a discriminatively trained pictorial structure model. Experiments on two benchmarks show performance improvements when considering the proposed mid-level image representation and rescoring approach in comparison with other pictorial structure-based approaches.
|
|
|
Frederic Sampedro, Anna Domenech, & Sergio Escalera. (2014). Static and dynamic computational cancer spread quantification in whole body FDG-PET/CT scans. JMIHI - Journal of Medical Imaging and Health Informatics, 4(6), 825–831.
Abstract: In this work we address the computational cancer spread quantification scenario in whole body FDG-PET/CT scans. At the static level, this setting can be modeled as a clustering problem on the set of 3D connected components of the whole body PET tumoral segmentation mask carried out by nuclear medicine physicians. At the dynamic level, and ad-hoc algorithm is proposed in order to quantify the cancer spread time evolution which, when combined with other existing indicators, gives rise to the metabolic tumor volume-aggressiveness-spread time evolution chart, a novel tool that we claim that would prove useful in nuclear medicine and oncological clinical or research scenarios. Good performance results of the proposed methodologies both at the clinical and technological level are shown using a dataset of 48 segmented whole body FDG-PET/CT scans.
Keywords: CANCER SPREAD; COMPUTER AIDED DIAGNOSIS; MEDICAL IMAGING; TUMOR QUANTIFICATION
|
|
|
Frederic Sampedro, Sergio Escalera, & Anna Puig. (2014). Iterative Multiclass Multiscale Stacked Sequential Learning: definition and application to medical volume segmentation. PRL - Pattern Recognition Letters, 46, 1–10.
Abstract: In this work we present the iterative multi-class multi-scale stacked sequential learning framework (IMMSSL), a novel learning scheme that is particularly suited for medical volume segmentation applications. This model exploits the inherent voxel contextual information of the structures of interest in order to improve its segmentation performance results. Without any feature set or learning algorithm prior assumption, the proposed scheme directly seeks to learn the contextual properties of a region from the predicted classifications of previous classifiers within an iterative scheme. Performance results regarding segmentation accuracy in three two-class and multi-class medical volume datasets show a significant improvement with respect to state of the art alternatives. Due to its easiness of implementation and its independence of feature space and learning algorithm, the presented machine learning framework could be taken into consideration as a first choice in complex volume segmentation scenarios.
Keywords: Machine learning; Sequential learning; Multi-class problems; Contextual learning; Medical volume segmentation
|
|
|
Eloi Puertas, Miguel Angel Bautista, Daniel Sanchez, Sergio Escalera, & Oriol Pujol. (2014). Learning to Segment Humans by Stacking their Body Parts,. In ECCV Workshop on ChaLearn Looking at People (Vol. 8925, pp. 685–697). LNCS.
Abstract: Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body
part likelihood maps. These likelihood maps are obtained in a first stage
by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.
Keywords: Human body segmentation; Stacked Sequential Learning
|
|
|
Frederic Sampedro, Sergio Escalera, Anna Domenech, & Ignasi Carrio. (2014). A computational framework for cancer response assessment based on oncological PET-CT scans. CBM - Computers in Biology and Medicine, 55, 92–99.
Abstract: In this work we present a comprehensive computational framework to help in the clinical assessment of cancer response from a pair of time consecutive oncological PET-CT scans. In this scenario, the design and implementation of a supervised machine learning system to predict and quantify cancer progression or response conditions by introducing a novel feature set that models the underlying clinical context is described. Performance results in 100 clinical cases (corresponding to 200 whole body PET-CT scans) in comparing expert-based visual analysis and classifier decision making show up to 70% accuracy within a completely automatic pipeline and 90% accuracy when providing the system with expert-guided PET tumor segmentation masks.
Keywords: Computer aided diagnosis; Nuclear medicine; Machine learning; Image processing; Quantitative analysis
|
|