Home | [1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30] |
Records | |||||
---|---|---|---|---|---|
Author | Olivier Penacchio | ||||
Title | Mixed Hodge Structures and Equivariant Sheaves on the Projective Plane | Type | Journal Article | ||
Year | 2011 | Publication | Mathematische Nachrichten | Abbreviated Journal | MN |
Volume | 284 | Issue | 4 | Pages | 526-542 |
Keywords | Mixed Hodge structures, equivariant sheaves, MSC (2010) Primary: 14C30, Secondary: 14F05, 14M25 | ||||
Abstract | We describe an equivalence of categories between the category of mixed Hodge structures and a category of equivariant vector bundles on a toric model of the complex projective plane which verify some semistability condition. We then apply this correspondence to define an invariant which generalizes the notion of R-split mixed Hodge structure and give calculations for the first group of cohomology of possibly non smooth or non-complete curves of genus 0 and 1. Finally, we describe some extension groups of mixed Hodge structures in terms of equivariant extensions of coherent sheaves. © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | WILEY-VCH Verlag | Place of Publication | Editor | R. Mennicken | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1522-2616 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ Pen2011 | Serial | 1721 | ||
Permanent link to this record | |||||
Author | Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez | ||||
Title | Determining the Best Suited Semantic Events for Cognitive Surveillance | Type | Journal Article | ||
Year | 2011 | Publication | Expert Systems with Applications | Abbreviated Journal | EXSY |
Volume | 38 | Issue | 4 | Pages | 4068–4079 |
Keywords | Cognitive surveillance; Event modeling; Content-based video retrieval; Ontologies; Advanced user interfaces | ||||
Abstract | State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ FBR2011a | Serial | 1722 | ||
Permanent link to this record | |||||
Author | Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez | ||||
Title | Augmenting Video Surveillance Footage with Virtual Agents for Incremental Event Evaluation | Type | Journal Article | ||
Year | 2011 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 32 | Issue | 6 | Pages | 878–889 |
Keywords | |||||
Abstract | The fields of segmentation, tracking and behavior analysis demand for challenging video resources to test, in a scalable manner, complex scenarios like crowded environments or scenes with high semantics. Nevertheless, existing public databases cannot scale the presence of appearing agents, which would be useful to study long-term occlusions and crowds. Moreover, creating these resources is expensive and often too particularized to specific needs. We propose an augmented reality framework to increase the complexity of image sequences in terms of occlusions and crowds, in a scalable and controllable manner. Existing datasets can be increased with augmented sequences containing virtual agents. Such sequences are automatically annotated, thus facilitating evaluation in terms of segmentation, tracking, and behavior recognition. In order to easily specify the desired contents, we propose a natural language interface to convert input sentences into virtual agent behaviors. Experimental tests and validation in indoor, street, and soccer environments are provided to show the feasibility of the proposed approach in terms of robustness, scalability, and semantics. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ FBR2011b | Serial | 1723 | ||
Permanent link to this record | |||||
Author | Arjan Gijsenij; Theo Gevers | ||||
Title | Color Constancy Using Natural Image Statistics and Scene Semantics | Type | Journal Article | ||
Year | 2011 | Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence | Abbreviated Journal | TPAMI |
Volume | 33 | Issue | 4 | Pages | 687-698 |
Keywords | |||||
Abstract | Existing color constancy methods are all based on specific assumptions such as the spatial and spectral characteristics of images. As a consequence, no algorithm can be considered as universal. However, with the large variety of available methods, the question is how to select the method that performs best for a specific image. To achieve selection and combining of color constancy algorithms, in this paper natural image statistics are used to identify the most important characteristics of color images. Then, based on these image characteristics, the proper color constancy algorithm (or best combination of algorithms) is selected for a specific image. To capture the image characteristics, the Weibull parameterization (e.g., grain size and contrast) is used. It is shown that the Weibull parameterization is related to the image attributes to which the used color constancy methods are sensitive. An MoG-classifier is used to learn the correlation and weighting between the Weibull-parameters and the image attributes (number of edges, amount of texture, and SNR). The output of the classifier is the selection of the best performing color constancy method for a certain image. Experimental results show a large improvement over state-of-the-art single algorithms. On a data set consisting of more than 11,000 images, an increase in color constancy performance up to 20 percent (median angular error) can be obtained compared to the best-performing single algorithm. Further, it is shown that for certain scene categories, one specific color constancy algorithm can be used instead of the classifier considering several algorithms. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0162-8828 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ GiG2011 | Serial | 1724 | ||
Permanent link to this record | |||||
Author | Albert Ali Salah; Theo Gevers; Nicu Sebe; Alessandro Vinciarelli | ||||
Title | Computer Vision for Ambient Intelligence | Type | Journal Article | ||
Year | 2011 | Publication | Journal of Ambient Intelligence and Smart Environments | Abbreviated Journal | JAISE |
Volume | 3 | Issue | 3 | Pages | 187-191 |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ SGS2011a | Serial | 1725 | ||
Permanent link to this record | |||||
Author | Arnau Ramisa; Alex Goldhoorn; David Aldavert; Ricardo Toledo; Ramon Lopez de Mantaras | ||||
Title | Combining Invariant Features and the ALV Homing Method for Autonomous Robot Navigation Based on Panoramas | Type | Journal Article | ||
Year | 2011 | Publication | Journal of Intelligent and Robotic Systems | Abbreviated Journal | JIRC |
Volume | 64 | Issue | 3-4 | Pages | 625-649 |
Keywords | |||||
Abstract | Biologically inspired homing methods, such as the Average Landmark Vector, are an interesting solution for local navigation due to its simplicity. However, usually they require a modification of the environment by placing artificial landmarks in order to work reliably. In this paper we combine the Average Landmark Vector with invariant feature points automatically detected in panoramic images to overcome this limitation. The proposed approach has been evaluated first in simulation and, as promising results are found, also in two data sets of panoramas from real world environments. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer Netherlands | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0921-0296 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | RV;ADAS | Approved | no | ||
Call Number | Admin @ si @ RGA2011 | Serial | 1728 | ||
Permanent link to this record | |||||
Author | Koen E.A. van de Sande; Theo Gevers; Cees G.M. Snoek | ||||
Title | Empowering Visual Categorization with the GPU | Type | Journal Article | ||
Year | 2011 | Publication | IEEE Transactions on Multimedia | Abbreviated Journal | TMM |
Volume | 13 | Issue | 1 | Pages | 60-70 |
Keywords | |||||
Abstract | Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. Additionally, we also consider power usage as an evaluation metric. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to (1) keep categorization accuracy intact, (2) decompose the problem and (3) give the same numerical results. In the experiments on large scale datasets it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ SGS2011b | Serial | 1729 | ||
Permanent link to this record | |||||
Author | C. Alejandro Parraga; Jordi Roca; Maria Vanrell | ||||
Title | Do Basic Colors Influence Chromatic Adaptation? | Type | Journal Article | ||
Year | 2011 | Publication | Journal of Vision | Abbreviated Journal | VSS |
Volume | 11 | Issue | 11 | Pages | 85 |
Keywords | |||||
Abstract | Color constancy (the ability to perceive colors relatively stable under different illuminants) is the result of several mechanisms spread across different neural levels and responding to several visual scene cues. It is usually measured by estimating the perceived color of a grey patch under an illuminant change. In this work, we hypothesize whether chromatic adaptation (without a reference white or grey) could be driven by certain colors, specifically those corresponding to the universal color terms proposed by Berlin and Kay (1969). To this end we have developed a new psychophysical paradigm in which subjects adjust the color of a test patch (in CIELab space) to match their memory of the best example of a given color chosen from the universal terms list (grey, red, green, blue, yellow, purple, pink, orange and brown). The test patch is embedded inside a Mondrian image and presented on a calibrated CRT screen inside a dark cabin. All subjects were trained to “recall” their most exemplary colors reliably from memory and asked to always produce the same basic colors when required under several adaptation conditions. These include achromatic and colored Mondrian backgrounds, under a simulated D65 illuminant and several colored illuminants. A set of basic colors were measured for each subject under neutral conditions (achromatic background and D65 illuminant) and used as “reference” for the rest of the experiment. The colors adjusted by the subjects in each adaptation condition were compared to the reference colors under the corresponding illuminant and a “constancy index” was obtained for each of them. Our results show that for some colors the constancy index was better than for grey. The set of best adapted colors in each condition were common to a majority of subjects and were dependent on the chromaticity of the illuminant and the chromatic background considered. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1534-7362 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ PRV2011 | Serial | 1759 | ||
Permanent link to this record | |||||
Author | Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez | ||||
Title | Selective Spatio-Temporal Interest Points | Type | Journal Article | ||
Year | 2012 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 116 | Issue | 3 | Pages | 396-410 |
Keywords | |||||
Abstract | Recent progress in the field of human action recognition points towards the use of Spatio-TemporalInterestPoints (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ CHM2012 | Serial | 1806 | ||
Permanent link to this record | |||||
Author | Noha Elfiky; Fahad Shahbaz Khan; Joost Van de Weijer; Jordi Gonzalez | ||||
Title | Discriminative Compact Pyramids for Object and Scene Recognition | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 4 | Pages | 1627-1636 |
Keywords | |||||
Abstract | Spatial pyramids have been successfully applied to incorporating spatial information into bag-of-words based image representation. However, a major drawback is that it leads to high dimensional image representations. In this paper, we present a novel framework for obtaining compact pyramid representation. First, we investigate the usage of the divisive information theoretic feature clustering (DITC) algorithm in creating a compact pyramid representation. In many cases this method allows us to reduce the size of a high dimensional pyramid representation up to an order of magnitude with little or no loss in accuracy. Furthermore, comparison to clustering based on agglomerative information bottleneck (AIB) shows that our method obtains superior results at significantly lower computational costs. Moreover, we investigate the optimal combination of multiple features in the context of our compact pyramid representation. Finally, experiments show that the method can obtain state-of-the-art results on several challenging data sets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE; CAT;CIC | Approved | no | ||
Call Number | Admin @ si @ EKW2012 | Serial | 1807 | ||
Permanent link to this record | |||||
Author | Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva | ||||
Title | Circular Blurred Shape Model for Multiclass Symbol Recognition | Type | Journal Article | ||
Year | 2011 | Publication | IEEE Transactions on Systems, Man and Cybernetics (Part B) (IEEE) | Abbreviated Journal | TSMCB |
Volume | 41 | Issue | 2 | Pages | 497-506 |
Keywords | |||||
Abstract | In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1083-4419 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | MILAB; DAG;HuPBA | Approved | no | ||
Call Number | Admin @ si @ EFP2011 | Serial | 1784 | ||
Permanent link to this record | |||||
Author | Xavier Carrillo; E Fernandez-Nofrerias; Francesco Ciompi; Oriol Rodriguez-Leor; Petia Radeva; Neus Salvatella; Oriol Pujol; J. Mauri; A. Bayes | ||||
Title | Changes in Radial Artery Volume Assessed Using Intravascular Ultrasound: A Comparison of Two Vasodilator Regimens in Transradial Coronary Intervention | Type | Journal Article | ||
Year | 2011 | Publication | Journal of Invasive Cardiology | Abbreviated Journal | JOIC |
Volume | 23 | Issue | 10 | Pages | 401-404 |
Keywords | radial; vasodilator treatment; percutaneous coronary intervention; IVUS; volumetric IVUS analysis | ||||
Abstract | OBJECTIVES:
This study used intravascular ultrasound (IVUS) to evaluate radial artery volume changes after intraarterial administration of nitroglycerin and/or verapamil. BACKGROUND: Radial artery spasm, which is associated with radial artery size, is the main limitation of the transradial approach in percutaneous coronary interventions (PCI). METHODS: This prospective, randomized study compared the effect of two intra-arterial vasodilator regimens on radial artery volume: 0.2 mg of nitroglycerin plus 2.5 mg of verapamil (Group 1; n = 15) versus 2.5 mg of verapamil alone (Group 2; n = 15). Radial artery lumen volume was assessed using IVUS at two time points: at baseline (5 minutes after sheath insertion) and post-vasodilator (1 minute after drug administration). The luminal volume of the radial artery was computed using ECOC Random Fields (ECOC-RF), a technique used for automatic segmentation of luminal borders in longitudinal cut images from IVUS sequences. RESULTS: There was a significant increase in arterial lumen volume in both groups, with an increase from 451 ± 177 mm³ to 508 ± 192 mm³ (p = 0.001) in Group 1 and from 456 ± 188 mm³ to 509 ± 170 mm³ (p = 0.001) in Group 2. There were no significant differences between the groups in terms of absolute volume increase (58 mm³ versus 53 mm³, respectively; p = 0.65) or in relative volume increase (14% versus 20%, respectively; p = 0.69). CONCLUSIONS: Administration of nitroglycerin plus verapamil or verapamil alone to the radial artery resulted in similar increases in arterial lumen volume according to ECOC-RF IVUS measurements. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB;HuPBA | Approved | no | ||
Call Number | Admin @ si @ CFC2011 | Serial | 1797 | ||
Permanent link to this record | |||||
Author | Miguel Angel Bautista; Sergio Escalera; Xavier Baro; Petia Radeva; Jordi Vitria; Oriol Pujol | ||||
Title | Minimal Design of Error-Correcting Output Codes | Type | Journal Article | ||
Year | 2011 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 33 | Issue | 6 | Pages | 693-702 |
Keywords | Multi-class classification; Error-correcting output codes; Ensemble of classifiers | ||||
Abstract | IF JCR CCIA 1.303 2009 54/103
The classification of large number of object categories is a challenging trend in the pattern recognition field. In literature, this is often addressed using an ensemble of classifiers. In this scope, the Error-correcting output codes framework has demonstrated to be a powerful tool for combining classifiers. However, most state-of-the-art ECOC approaches use a linear or exponential number of classifiers, making the discrimination of a large number of classes unfeasible. In this paper, we explore and propose a minimal design of ECOC in terms of the number of classifiers. Evolutionary computation is used for tuning the parameters of the classifiers and looking for the best minimal ECOC code configuration. The results over several public UCI datasets and different multi-class computer vision problems show that the proposed methodology obtains comparable (even better) results than state-of-the-art ECOC methodologies with far less number of dichotomizers. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0167-8655 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | MILAB; OR;HuPBA;MV | Approved | no | ||
Call Number | Admin @ si @ BEB2011a | Serial | 1800 | ||
Permanent link to this record | |||||
Author | Carlo Gatta; Eloi Puertas; Oriol Pujol | ||||
Title | Multi-Scale Stacked Sequential Learning | Type | Journal Article | ||
Year | 2011 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 44 | Issue | 10-11 | Pages | 2414-2416 |
Keywords | Stacked sequential learning; Multiscale; Multiresolution; Contextual classification | ||||
Abstract | One of the most widely used assumptions in supervised learning is that data is independent and identically distributed. This assumption does not hold true in many real cases. Sequential learning is the discipline of machine learning that deals with dependent data such that neighboring examples exhibit some kind of relationship. In the literature, there are different approaches that try to capture and exploit this correlation, by means of different methodologies. In this paper we focus on meta-learning strategies and, in particular, the stacked sequential learning approach. The main contribution of this work is two-fold: first, we generalize the stacked sequential learning. This generalization reflects the key role of neighboring interactions modeling. Second, we propose an effective and efficient way of capturing and exploiting sequential correlations that takes into account long-range interactions by means of a multi-scale pyramidal decomposition of the predicted labels. Additionally, this new method subsumes the standard stacked sequential learning approach. We tested the proposed method on two different classification tasks: text lines classification in a FAQ data set and image classification. Results on these tasks clearly show that our approach outperforms the standard stacked sequential learning. Moreover, we show that the proposed method allows to control the trade-off between the detail and the desired range of the interactions. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB;HuPBA | Approved | no | ||
Call Number | Admin @ si @ GPP2011 | Serial | 1802 | ||
Permanent link to this record | |||||
Author | Palaiahnakote Shivakumara; Anjan Dutta; Chew Lim Tan; Umapada Pal | ||||
Title | Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing | Type | Journal Article | ||
Year | 2014 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 72 | Issue | 1 | Pages | 515-539 |
Keywords | |||||
Abstract | In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer US | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1380-7501 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.077 | Approved | no | ||
Call Number | Admin @ si @ SDT2014 | Serial | 2357 | ||
Permanent link to this record |