Lluis Pere de las Heras, Joan Mas, Gemma Sanchez, & Ernest Valveny. (2011). Wall Patch-Based Segmentation in Architectural Floorplans. In 11th International Conference on Document Analysis and Recognition (pp. 1270–1274).
Abstract: Segmentation of architectural floor plans is a challenging task, mainly because of the large variability in the notation between different plans. In general, traditional techniques, usually based on analyzing and grouping structural primitives obtained by vectorization, are only able to handle a reduced range of similar notations. In this paper we propose an alternative patch-based segmentation approach working at pixel level, without need of vectorization. The image is divided into a set of patches and a set of features is extracted for every patch. Then, each patch is assigned to a visual word of a previously learned vocabulary and given a probability of belonging to each class of objects. Finally, a post-process assigns the final label for every pixel. This approach has been applied to the detection of walls on two datasets of architectural floor plans with different notations, achieving high accuracy rates.
|
Dimosthenis Karatzas, Sergi Robles, Joan Mas, Farshad Nourbakhsh, & Partha Pratim Roy. (2011). ICDAR 2011 Robust Reading Competition – Challege 1: Reading Text in Born-Digital Images (Web and Email). In 11th International Conference on Document Analysis and Recognition (pp. 1485–1490).
Abstract: This paper presents the results of the first Challenge of ICDAR 2011 Robust Reading Competition. Challenge 1 is focused on the extraction of text from born-digital images, specifically from images found in Web pages and emails. The challenge was organized in terms of three tasks that look at different stages of the process: text localization, text segmentation and word recognition. In this paper we present the results of the challenge for all three tasks, and make an open call for continuous participation outside the context of ICDAR 2011.
|
Alicia Fornes, Anjan Dutta, Albert Gordo, & Josep Llados. (2011). The ICDAR 2011 Music Scores Competition: Staff Removal and Writer Identification. In 11th International Conference on Document Analysis and Recognition (pp. 1511–1515).
Abstract: In the last years, there has been a growing interest in the analysis of handwritten music scores. In this sense, our goal has been to foster the interest in the analysis of handwritten music scores by the proposal of two different competitions: Staff removal and Writer Identification. Both competitions have been tested on the CVC-MUSCIMA database: a ground-truth of handwritten music score images. This paper describes the competition details, including the dataset and ground-truth, the evaluation metrics, and a short description of the participants, their methods, and the obtained results.
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang, & Josep Llados. (2013). Plausibility-Graphs for Symbol Spotting in Graphical Documents. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: Graph representation of graphical documents often suffers from noise viz. spurious nodes and spurios edges of graph and their discontinuity etc. In general these errors occur during the low-level image processing viz. binarization, skeletonization, vectorization etc. Hierarchical graph representation is a nice and efficient way to solve this kind of problem by hierarchically merging node-node and node-edge depending on the distance.
But the creation of hierarchical graph representing the graphical information often uses hard thresholds on the distance to create the hierarchical nodes (next state) of the lower nodes (or states) of a graph. As a result the representation often loses useful information. This paper introduces plausibilities to the nodes of hierarchical graph as a function of distance and proposes a modified algorithm for matching subgraphs of the hierarchical
graphs. The plausibility-annotated nodes help to improve the performance of the matching algorithm on two hierarchical structures. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
Sergio Escalera. (2013). Multi-Modal Human Behaviour Analysis from Visual Data Sources. ERCIM - ERCIM News journal, 21–22.
Abstract: The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.
|
Francesco Ciompi, A. Palaioroutas, M. Loeve, Oriol Pujol, Petia Radeva, H. Tiddens, et al. (2011). Lung Tissue Classification in Severe Advanced Cystic Fibrosis from CT Scans. In In MICCAI 2011 4th International Workshop on Pulmonary Image Analysis.
|
Simone Balocco, Carlo Gatta, Xavier Carrillo, F. Mauri, & Petia Radeva. (2011). Plaque Type, Plaque Burden and Wall Shear Stress Relation in Coronary Arteries Assessed by X-ray Angiography and Intravascular Ultrasound: a Qualitative Study. In 14th International Symposium on Applied Sciences in Biomedical and Communication Technologies.
Abstract: In this paper, we present a complete framework that automatically provides fluid-dynamic and plaque analysis from IVUS and Angiographic sequences. Such framework is used to analyze, in three coronary arteries, the relation between wall shear stress with type and amount of plaque. Preliminary qualitative results show an inverse relation between the wall shear stress and the plaque burden, which is confirmed by the fact that the plaque growth is higher on the wall having concave curvature. Regarding the plaque type it was observed that regions having low shear stress are predominantly fibro-lipidic while the heavy calcifications are in general located in areas of the vessel having high WSS.
|
Sergio Escalera, Xavier Baro, Oriol Pujol, Jordi Vitria, & Petia Radeva. (2011). Traffic-Sign Recognition Systems. Springer London.
|
E. Serradell, Adriana Romero, R. Leta, Carlo Gatta, & Francesc Moreno-Noguer. (2011). Simultaneous Correspondence and Non-Rigid 3D Reconstruction of the Coronary Tree from Single X-Ray Images. In 13th IEEE International Conference on Computer Vision (pp. 850–857).
|
Bogdan Raducanu, & Fadi Dornaika. (2011). A Discriminative Non-Linear Manifold Learning Technique for Face Recognition. In Informatics Engineering and Information Science (Vol. 254, pp. 339–353). Springer Berlin Heidelberg.
Abstract: In this paper we propose a novel non-linear discriminative analysis technique for manifold learning. The proposed approach is a discriminant version of Laplacian Eigenmaps which takes into account the class label information in order to guide the procedure of non-linear dimensionality reduction. By following the large margin concept, the graph Laplacian is split in two components: within-class graph and between-class graph to better characterize the discriminant property of the data.
Our approach has been tested on several challenging face databases and it has been conveniently compared with other linear and non-linear techniques. The experimental results confirm that our method outperforms, in general, the existing ones. Although we have concentrated in this paper on the face recognition problem, the proposed approach could also be applied to other category of objects characterized by large variance in their appearance.
|
Palaiahnakote Shivakumara, Anjan Dutta, Chew Lim Tan, & Umapada Pal. (2014). Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTAP - Multimedia Tools and Applications, 72(1), 515–539.
Abstract: In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
|
Anjan Dutta, Josep Llados, Horst Bunke, & Umapada Pal. (2013). Near Convex Region Adjacency Graph and Approximate Neighborhood String Matching for Symbol Spotting in Graphical Documents. In 12th International Conference on Document Analysis and Recognition (pp. 1078–1082).
Abstract: This paper deals with a subgraph matching problem in Region Adjacency Graph (RAG) applied to symbol spotting in graphical documents. RAG is a very important, efficient and natural way of representing graphical information with a graph but this is limited to cases where the information is well defined with perfectly delineated regions. What if the information we are interested in is not confined within well defined regions? This paper addresses this particular problem and solves it by defining near convex grouping of oriented line segments which results in near convex regions. Pure convexity imposes hard constraints and can not handle all the cases efficiently. Hence to solve this problem we have defined a new type of convexity of regions, which allows convex regions to have concavity to some extend. We call this kind of regions Near Convex Regions (NCRs). These NCRs are then used to create the Near Convex Region Adjacency Graph (NCRAG) and with this representation we have formulated the problem of symbol spotting in graphical documents as a subgraph matching problem. For subgraph matching we have used the Approximate Edit Distance Algorithm (AEDA) on the neighborhood string, which starts working after finding a key node in the input or target graph and iteratively identifies similar nodes of the query graph in the neighborhood of the key node. The experiments are performed on artificial, real and distorted datasets.
|
Anjan Dutta, Josep Llados, Horst Bunke, & Umapada Pal. (2013). A Product graph based method for dual subgraph matching applied to symbol spotting. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: Product graph has been shown to be an efficient way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. This paper focuses on the two major limitations of the previous version of product graph: (1) Spurious nodes and edges in the graph representation and (2) Inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual graph representation on the original graph representing the graphical information and the product graph is computed between the dual graphs of the query graphs and the input graph.
The dual graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates similar path information of two graphs and exponentiating the adjacency matrix finds similar paths of greater lengths. Nodes joining similar paths between two graphs are found by combining different exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
|
Nataliya Shapovalova, Carles Fernandez, Xavier Roca, & Jordi Gonzalez. (2011). Semantics of Human Behavior in Image Sequences. In Albert Ali Salah, & (Ed.), Computer Analysis of Human Behavior (pp. 151–182). Springer London.
Abstract: Human behavior is contextualized and understanding the scene of an action is crucial for giving proper semantics to behavior. In this chapter we present a novel approach for scene understanding. The emphasis of this work is on the particular case of Human Event Understanding. We introduce a new taxonomy to organize the different semantic levels of the Human Event Understanding framework proposed. Such a framework particularly contributes to the scene understanding domain by (i) extracting behavioral patterns from the integrative analysis of spatial, temporal, and contextual evidence and (ii) integrative analysis of bottom-up and top-down approaches in Human Event Understanding. We will explore how the information about interactions between humans and their environment influences the performance of activity recognition, and how this can be extrapolated to the temporal domain in order to extract higher inferences from human events observed in sequences of images.
|
Bhaskar Chakraborty, Michael Holte, Thomas B. Moeslund, Jordi Gonzalez, & Xavier Roca. (2011). A Selective Spatio-Temporal Interest Point Detector for Human Action Recognition in Complex Scenes. In 13th IEEE International Conference on Computer Vision (pp. 1776–1783).
Abstract: Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.
|