Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

Details

Records
Author	Marc Bolaños; Maite Garolera; Petia Radeva
Title	Object Discovery using CNN Features in Egocentric Videos			Type	Conference Article
Year	2015	Publication	Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015	Abbreviated Journal
Volume	9117	Issue		Pages	67-74
Keywords	Object discovery; Egocentric videos; Lifelogging; CNN
Abstract	Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.
Address	Santiago de Compostela; España; June 2015
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-19389-2	Medium
Area		Expedition		Conference	IbPRIA
Notes	MILAB			Approved	no
Call Number	Admin @ si @ BGR2015			Serial	2596
Permanent link to this record



Author	Estefania Talavera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva
Title	R-clustering for egocentric video segmentation			Type	Conference Article
Year	2015	Publication	Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015	Abbreviated Journal
Volume	9117	Issue		Pages	327-336
Keywords	Temporal video segmentation; Egocentric videos; Clustering
Abstract	In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Address	Santiago de Compostela; España; June 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-19389-2	Medium
Area		Expedition		Conference	IbPRIA
Notes	MILAB			Approved	no
Call Number	Admin @ si @ TDB2015			Serial	2597
Permanent link to this record



Author	Onur Ferhat; Arcadi Llanza; Fernando Vilariño
Title	A Feature-Based Gaze Estimation Algorithm for Natural Light Scenarios			Type	Conference Article
Year	2015	Publication	Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015	Abbreviated Journal
Volume	9117	Issue		Pages	569-576
Keywords	Eye tracking; Gaze estimation; Natural light; Webcam
Abstract	We present an eye tracking system that works with regular webcams. We base our work on open source CVC Eye Tracker [7] and we propose a number of improvements and a novel gaze estimation method. The new method uses features extracted from iris segmentation and it does not fall into the traditional categorization of appearance–based/model–based methods. Our experiments show that our approach reduces the gaze estimation errors by 34 % in the horizontal direction and by 12 % in the vertical direction compared to the baseline system.
Address	Santiago de Compostela; June 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-19389-2	Medium
Area		Expedition		Conference	IbPRIA
Notes	MV;SIAI			Approved	no
Call Number	Admin @ si @ FLV2015a			Serial	2646
Permanent link to this record



Author	Suman Ghosh; Ernest Valveny
Title	A Sliding Window Framework for Word Spotting Based on Word Attributes			Type	Conference Article
Year	2015	Publication	Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015	Abbreviated Journal
Volume	9117	Issue		Pages	652-661
Keywords	Word spotting; Sliding window; Word attributes
Abstract	In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
Address	Santiago de Compostela; June 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-19389-2	Medium
Area		Expedition		Conference	IbPRIA
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ GhV2015b			Serial	2716
Permanent link to this record



Author	E. Tavalera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva
Title	Regularized Clustering for Egocentric Video Segmentation			Type	Book Chapter
Year	2015	Publication	Pattern Recognition and Image Analysis	Abbreviated Journal
Volume		Issue		Pages	327-336
Keywords	Temporal video segmentation ; Egocentric videos ; Clustering
Abstract	In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energyminimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate techniques in an energy-minimization framework that serves disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Address
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-319-19390-8	Medium
Area		Expedition		Conference
Notes	MILAB			Approved	no
Call Number	Admin @ si @TDB2015a			Serial	2781
Permanent link to this record



Author	Fahad Shahbaz Khan; Muhammad Anwer Rao; Joost Van de Weijer; Michael Felsberg; J.Laaksonen
Title	Deep semantic pyramids for human attributes and action recognition			Type	Conference Article
Year	2015	Publication	Image Analysis, Proceedings of 19th Scandinavian Conference , SCIA 2015	Abbreviated Journal
Volume	9127	Issue		Pages	341-353
Keywords	Action recognition; Human attributes; Semantic pyramids
Abstract	Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features. We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of 17.2%, 13.9%, 24.3% and 22.6% compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.
Address	Denmark; Copenhagen; June 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-19664-0	Medium
Area		Expedition		Conference	SCIA
Notes	LAMP; 600.068; 600.079;ADAS			Approved	no
Call Number	Admin @ si @ KRW2015b			Serial	2672
Permanent link to this record



Author	Firat Ismailoglu; Ida G. Sprinkhuizen-Kuyper; Evgueni Smirnov; Sergio Escalera; Ralf Peeters
Title	Fractional Programming Weighted Decoding for Error-Correcting Output Codes			Type	Conference Article
Year	2015	Publication	Multiple Classifier Systems, Proceedings of 12th International Workshop , MCS 2015	Abbreviated Journal
Volume		Issue		Pages	38-50
Keywords
Abstract	In order to increase the classification performance obtained using Error-Correcting Output Codes designs (ECOC), introducing weights in the decoding phase of the ECOC has attracted a lot of interest. In this work, we present a method for ECOC designs that focuses on increasing hypothesis margin on the data samples given a base classifier. While achieving this, we implicitly reward the base classifiers with high performance, whereas punish those with low performance. The resulting objective function is of the fractional programming type and we deal with this problem through the Dinkelbach’s Algorithm. The conducted tests over well known UCI datasets show that the presented method is superior to the unweighted decoding and that it outperforms the results of the state-of-the-art weighted decoding methods in most of the performed experiments.
Address	Gunzburg; Germany; June 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-319-20247-1	Medium
Area		Expedition		Conference	MCS
Notes	HuPBA;MILAB			Approved	no
Call Number	Admin @ si @ ISS2015			Serial	2601
Permanent link to this record



Author	Dennis G.Romero; Anselmo Frizera; Angel Sappa; Boris X. Vintimilla; Teodiano F.Bastos
Title	A predictive model for human activity recognition by observing actions and context			Type	Conference Article
Year	2015	Publication	Advanced Concepts for Intelligent Vision Systems, Proceedings of 16th International Conference, ACIVS 2015	Abbreviated Journal
Volume	9386	Issue		Pages	323-333
Keywords
Abstract	This paper presents a novel model to estimate human activities — a human activity is defined by a set of human actions. The proposed approach is based on the usage of Recurrent Neural Networks (RNN) and Bayesian inference through the continuous monitoring of human actions and its surrounding environment. In the current work human activities are inferred considering not only visual analysis but also additional resources; external sources of information, such as context information, are incorporated to contribute to the activity estimation. The novelty of the proposed approach lies in the way the information is encoded, so that it can be later associated according to a predefined semantic structure. Hence, a pattern representing a given activity can be defined by a set of actions, plus contextual information or other kind of information that could be relevant to describe the activity. Experimental results with real data are provided showing the validity of the proposed approach.
Address	Catania; Italy; October 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-25902-4	Medium
Area		Expedition		Conference	ACIVS
Notes	ADAS; 600.076			Approved	no
Call Number	Admin @ si @ RFS2015			Serial	2661
Permanent link to this record



Author	J.Poujol; Cristhian A. Aguilera-Carrasco; E.Danos; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa
Title	Visible-Thermal Fusion based Monocular Visual Odometry			Type	Conference Article
Year	2015	Publication	2nd Iberian Robotics Conference ROBOT2015	Abbreviated Journal
Volume	417	Issue		Pages	517-528
Keywords	Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.
Abstract	The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained representations are evaluated under a visual odometry framework, highlighting their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
Address	Lisboa; Portugal; November 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	2194-5357	ISBN	978-3-319-27145-3	Medium
Area		Expedition		Conference	ROBOT
Notes	ADAS; 600.076; 600.086			Approved	no
Call Number	Admin @ si @ PAD2015			Serial	2663
Permanent link to this record



Author	Aleksandr Setkov; Fabio Martinez Carillo; Michele Gouiffes; Christian Jacquemin; Maria Vanrell; Ramon Baldrich
Title	DAcImPro: A Novel Database of Acquired Image Projections and Its Application to Object Recognition			Type	Conference Article
Year	2015	Publication	Advances in Visual Computing. Proceedings of 11th International Symposium, ISVC 2015 Part II	Abbreviated Journal
Volume	9475	Issue		Pages	463-473
Keywords	Projector-camera systems; Feature descriptors; Object recognition
Abstract	Projector-camera systems are designed to improve the projection quality by comparing original images with their captured projections, which is usually complicated due to high photometric and geometric variations. Many research works address this problem using their own test data which makes it extremely difficult to compare different proposals. This paper has two main contributions. Firstly, we introduce a new database of acquired image projections (DAcImPro) that, covering photometric and geometric conditions and providing data for ground-truth computation, can serve to evaluate different algorithms in projector-camera systems. Secondly, a new object recognition scenario from acquired projections is presented, which could be of a great interest in such domains, as home video projections and public presentations. We show that the task is more challenging than the classical recognition problem and thus requires additional pre-processing, such as color compensation or projection area selection.
Address
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-27862-9	Medium
Area		Expedition		Conference	ISVC
Notes	CIC			Approved	no
Call Number	Admin @ si @ SMG2015			Serial	2736
Permanent link to this record



Author	Hanne Kause; Aura Hernandez-Sabate; Patricia Marquez; Andrea Fuster; Luc Florack; Hans van Assen; Debora Gil
Title	Confidence Measures for Assessing the HARP Algorithm in Tagged Magnetic Resonance Imaging			Type	Book Chapter
Year	2015	Publication	Statistical Atlases and Computational Models of the Heart. Revised selected papers of Imaging and Modelling Challenges 6th International Workshop, STACOM 2015, Held in Conjunction with MICCAI 2015	Abbreviated Journal
Volume	9534	Issue		Pages	69-79
Keywords
Abstract	Cardiac deformation and changes therein have been linked to pathologies. Both can be extracted in detail from tagged Magnetic Resonance Imaging (tMRI) using harmonic phase (HARP) images. Although point tracking algorithms have shown to have high accuracies on HARP images, these vary with position. Detecting and discarding areas with unreliable results is crucial for use in clinical support systems. This paper assesses the capability of two confidence measures (CMs), based on energy and image structure, for detecting locations with reduced accuracy in motion tracking results. These CMs were tested on a database of simulated tMRI images containing the most common artifacts that may affect tracking accuracy. CM performance is assessed based on its capability for HARP tracking error bounding and compared in terms of significant differences detected using a multi comparison analysis of variance that takes into account the most influential factors on HARP tracking performance. Results showed that the CM based on image structure was better suited to detect unreliable optical flow vectors. In addition, it was shown that CMs can be used to detect optical flow vectors with large errors in order to improve the optical flow obtained with the HARP tracking algorithm.
Address	Munich; Germany; January 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-319-28711-9	Medium
Area		Expedition		Conference	STACOM
Notes	ADAS; IAM; 600.075; 600.076; 600.060; 601.145			Approved	no
Call Number	Admin @ si @ KHM2015			Serial	2734
Permanent link to this record



Author	Pau Riba; Alicia Fornes; Josep Llados
Title	Towards the Alignment of Handwritten Music Scores			Type	Conference Article
Year	2015	Publication	11th IAPR International Workshop on Graphics Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
Address	Nancy; France; August 2015
Corporate Author				Thesis
Publisher	Springer International Publishing	Place of Publication		Editor	Bart Lamiroy; Rafael Dueire Lins
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-319-52158-9	Medium
Area		Expedition		Conference	GREC
Notes	DAG			Approved	no
Call Number	Admin @ si @			Serial	2874
Permanent link to this record



Author	C. Alejandro Parraga
Title	Perceptual Psychophysics			Type	Book Chapter
Year	2015	Publication	Biologically-Inspired Computer Vision: Fundamentals and Applications	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor	G.Cristobal; M.Keil; L.Perrinet
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-527-41264-8	Medium
Area		Expedition		Conference
Notes	CIC; 600.074			Approved	no
Call Number	Admin @ si @ Par2015			Serial	2600
Permanent link to this record



Author	Antonio Hernandez
Title	From pixels to gestures: learning visual representations for human analysis in color and depth data sequences			Type	Book Whole
Year	2015	Publication	PhD Thesis, Universitat de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others. In this dissertation we are interested in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RGB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition. First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on Graph cuts optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts. At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body. In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually limiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets. Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences. A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
Address	January 2015
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Sergio Escalera;Stan Sclaroff
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-940902-0-2	Medium
Area		Expedition		Conference
Notes	HuPBA;MILAB			Approved	no
Call Number	Admin @ si @ Her2015			Serial	2576
Permanent link to this record



Author	Hongxing Gao
Title	Focused Structural Document Image Retrieval in Digital Mailroom Applications			Type	Book Whole
Year	2015	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented. Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed. We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods. To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field. Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
Address	January 2015
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Dimosthenis Karatzas;Marçal Rusiñol
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-943427-0-7	Medium
Area		Expedition		Conference
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ Gao2015			Serial	2577
Permanent link to this record