Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

Details

Records
Author	Murad Al Haj; Carles Fernandez; Zhanwu Xiong; Ivan Huerta; Jordi Gonzalez; Xavier Roca
Title	Beyond the Static Camera: Issues and Trends in Active Vision			Type	Book Chapter
Year	2011	Publication	Visual Analysis of Humans: Looking at People	Abbreviated Journal
Volume		Issue	2	Pages	11-30
Keywords
Abstract	Maximizing both the area coverage and the resolution per target is highly desirable in many applications of computer vision. However, with a limited number of cameras viewing a scene, the two objectives are contradictory. This chapter is dedicated to active vision systems, trying to achieve a trade-off between these two aims and examining the use of high-level reasoning in such scenarios. The chapter starts by introducing different approaches to active cameras configurations. Later, a single active camera system to track a moving object is developed, offering the reader first-hand understanding of the issues involved. Another section discusses practical considerations in building an active vision platform, taking as an example a multi-camera system developed for a European project. The last section of the chapter reflects upon the future trends of using semantic factors to drive smartly coordinated active systems.
Address
Corporate Author				Thesis
Publisher	Springer London	Place of Publication		Editor	Th.B. Moeslund; A. Hilton; V. Krüger; L. Sigal
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-0-85729-996-3	Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ AFX2011			Serial	1814
Permanent link to this record



Author	Sergio Vera; Debora Gil; Agnes Borras; F. Javier Sanchez; Frederic Perez; Marius G. Linguraru
Title	Computation and Evaluation of Medial Surfaces for Shape Representation of Abdominal Organs			Type	Conference Article
Year	2011	Publication	Workshop on Computational and Clinical Applications in Abdominal Imaging	Abbreviated Journal
Volume	7029	Issue		Pages	223-230
Keywords
Abstract	Medial representations are powerful tools for describing and parameterizing the volumetric shape of anatomical structures. Existing methods show excellent results when applied to 2D objects, but their quality drops across dimensions. This paper contributes to the computation of medial manifolds in two aspects. First, we provide a standard scheme for the computation of medial manifolds that avoid degenerated medial axis segments; second, we introduce an energy based method which performs independently of the dimension. We evaluate quantitatively the performance of our method with respect to existing approaches, by applying them to synthetic shapes of known medial geometry. Finally, we show results on shape representation of multiple abdominal organs, exploring the use of medial manifolds for the representation of multi-organ relations.
Address	Nice, France
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	In H. Yoshida et al
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ABDI
Notes	IAM; MV			Approved	no
Call Number	VGB2011			Serial	2036
Permanent link to this record



Author	Mario Rojas; David Masip; Jordi Vitria
Title	Automatic Detection of Facial Feature Points via HOGs and Geometric Prior Models			Type	Conference Article
Year	2011	Publication	5th Iberian Conference on Pattern Recognition and Image Analysis	Abbreviated Journal
Volume	6669	Issue		Pages	371-378
Keywords
Abstract	Most applications dealing with problems involving the face require a robust estimation of the facial salient points. Nevertheless, this estimation is not usually an automated preprocessing step in applications dealing with facial expression recognition. In this paper we present a simple method to detect facial salient points in the face. It is based on a prior Point Distribution Model and a robust object descriptor. The model learns the distribution of the points from the training data, as well as the amount of variation in location each point exhibits. Using this model, we reduce the search areas to look for each point. In addition, we also exploit the global consistency of the points constellation, increasing the detection accuracy. The method was tested on two separate data sets and the results, in some cases, outperform the state of the art.
Address	Las Palmas de Gran Canaria. Spain
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-21256-7	Medium
Area		Expedition		Conference	IbPRIA
Notes	OR;MV			Approved	no
Call Number	Admin @ si @ RMV2011a			Serial	1731
Permanent link to this record



Author	Antonio Hernandez; Carlos Primo; Sergio Escalera
Title	Automatic user interaction correction via Multi-label Graph cuts			Type	Conference Article
Year	2011	Publication	In ICCV 2011 1st IEEE International Workshop on Human Interaction in Computer Vision HICV	Abbreviated Journal
Volume		Issue		Pages	1276-1281
Keywords
Abstract	Most applications in image segmentation requires from user interaction in order to achieve accurate results. However, user wants to achieve the desired segmentation accuracy reducing effort of manual labelling. In this work, we extend standard multi-label α-expansion Graph Cut algorithm so that it analyzes the interaction of the user in order to modify the object model and improve final segmentation of objects. The approach is inspired in the fact that fast user interactions may introduce some pixel errors confusing object and background. Our results with different degrees of user interaction and input errors show high performance of the proposed approach on a multi-label human limb segmentation problem compared with classical α-expansion algorithm.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-4673-0062-9	Medium
Area		Expedition		Conference	HICV
Notes	MILAB; HuPBA			Approved	no
Call Number	Admin @ si @ HPE2011			Serial	1892
Permanent link to this record



Author	M. Visani; Oriol Ramos Terrades; Salvatore Tabbone
Title	A Protocol to Characterize the Descriptive Power and the Complementarity of Shape Descriptors			Type	Journal Article
Year	2011	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
Volume	14	Issue	1	Pages	87-100
Keywords	Document analysis; Shape descriptors; Symbol description; Performance characterization; Complementarity analysis
Abstract	Most document analysis applications rely on the extraction of shape descriptors, which may be grouped into different categories, each category having its own advantages and drawbacks (O.R. Terrades et al. in Proceedings of ICDAR’07, pp. 227–231, 2007). In order to improve the richness of their description, many authors choose to combine multiple descriptors. Yet, most of the authors who propose a new descriptor content themselves with comparing its performance to the performance of a set of single state-of-the-art descriptors in a specific applicative context (e.g. symbol recognition, symbol spotting...). This results in a proliferation of the shape descriptors proposed in the literature. In this article, we propose an innovative protocol, the originality of which is to be as independent of the final application as possible and which relies on new quantitative and qualitative measures. We introduce two types of measures: while the measures of the first type are intended to characterize the descriptive power (in terms of uniqueness, distinctiveness and robustness towards noise) of a descriptor, the second type of measures characterizes the complementarity between multiple descriptors. Characterizing upstream the complementarity of shape descriptors is an alternative to the usual approach where the descriptors to be combined are selected by trial and error, considering the performance characteristics of the overall system. To illustrate the contribution of this protocol, we performed experimental studies using a set of descriptors and a set of symbols which are widely used by the community namely ART and SC descriptors and the GREC 2003 database.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; IF 1.091			Approved	no
Call Number	Admin @ si @VRT2011			Serial	1856
Permanent link to this record



Author	Gemma Roig; Xavier Boix; F. de la Torre; Joan Serrat; C. Vilella
Title	Hierarchical CRF with product label spaces for parts-based Models			Type	Conference Article
Year	2011	Publication	IEEE Conference on Automatic Face and Gesture Recognition	Abbreviated Journal
Volume		Issue		Pages	657-664
Keywords	Shape; Computational modeling; Principal component analysis; Random variables; Color; Upper bound; Facial features
Abstract	Non-rigid object detection is a challenging an open research problem in computer vision. It is a critical part in many applications such as image search, surveillance, human-computer interaction or image auto-annotation. Most successful approaches to non-rigid object detection make use of part-based models. In particular, Conditional Random Fields (CRF) have been successfully embedded into a discriminative parts-based model framework due to its effectiveness for learning and inference (usually based on a tree structure). However, CRF-based approaches do not incorporate global constraints and only model pairwise interactions. This is especially important when modeling object classes that may have complex parts interactions (e.g. facial features or body articulations), because neglecting them yields an oversimplified model with suboptimal performance. To overcome this limitation, this paper proposes a novel hierarchical CRF (HCRF). The main contribution is to build a hierarchy of part combinations by extending the label set to a hierarchy of product label spaces. In order to keep the inference computation tractable, we propose an effective method to reduce the new label set. We test our method on two applications: facial feature detection on the Multi-PIE database and human pose estimation on the Buffy dataset.
Address	Santa Barbara, CA, USA, 2011
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	FG
Notes	ADAS			Approved	no
Call Number	Admin @ si @ RBT2011			Serial	1862
Permanent link to this record



Author	Jaime Moreno
Title	Perceptual Criteria on Image Compresions			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Nowadays, digital images are used in many areas in everyday life, but they tend to be big. This increases amount of information leads us to the problem of image data storage. For example, it is common to have a representation a color pixel as a 24-bit number, where the channels red, green, and blue employ 8 bits each. In consequence, this kind of color pixel can specify one of 224 ¼ 16:78 million colors. Therefore, an image at a resolution of 512 £ 512 that allocates 24 bits per pixel, occupies 786,432 bytes. That is why image compression is important. An important feature of image compression is that it can be lossy or lossless. A compressed image is acceptable provided these losses of image information are not perceived by the eye. It is possible to assume that a portion of this information is redundant. Lossless Image Compression is defined as to mathematically decode the same image which was encoded. In Lossy Image Compression needs to identify two features inside the image: the redundancy and the irrelevancy of information. Thus, lossy compression modifies the image data in such a way when they are encoded and decoded, the recovered image is similar enough to the original one. How similar is the recovered image in comparison to the original image is defined prior to the compression process, and it depends on the implementation to be performed. In lossy compression, current image compression schemes remove information considered irrelevant by using mathematical criteria. One of the problems of these schemes is that although the numerical quality of the compressed image is low, it shows a high visual image quality, e.g. it does not show a lot of visible artifacts. It is because these mathematical criteria, used to remove information, do not take into account if the viewed information is perceived by the Human Visual System. Therefore, the aim of an image compression scheme designed to obtain images that do not show artifacts although their numerical quality can be low, is to eliminate the information that is not visible by the Human Visual System. Hence, this Ph.D. thesis proposes to exploit the visual redundancy existing in an image by reducing those features that can be unperceivable for the Human Visual System. First, we define an image quality assessment, which is highly correlated with the psychophysical experiments performed by human observers. The proposed CwPSNR metrics weights the well-known PSNR by using a particular perceptual low level model of the Human Visual System, e.g. the Chromatic Induction Wavelet Model (CIWaM). Second, we propose an image compression algorithm (called Hi-SET), which exploits the high correlation and self-similarity of pixels in a given area or neighborhood by means of a fractal function. Hi-SET possesses the main features that modern image compressors have, that is, it is an embedded coder, which allows a progressive transmission. Third, we propose a perceptual quantizer (½SQ), which is a modification of the uniform scalar quantizer. The ½SQ is applied to a pixel set in a certain Wavelet sub-band, that is, a global quantization. Unlike this, the proposed modification allows to perform a local pixel-by-pixel forward and inverse quantization, introducing into this process a perceptual distortion which depends on the surround spatial information of the pixel. Combining ½SQ method with the Hi-SET image compressor, we define a perceptual image compressor, called ©SET. Finally, a coding method for Region of Interest areas is presented, ½GBbBShift, which perceptually weights pixels into these areas and maintains only the more important perceivable features in the rest of the image. Results presented in this report show that CwPSNR is the best-ranked image quality method when it is applied to the most common image compression distortions such as JPEG and JPEG2000. CwPSNR shows the best correlation with the judgement of human observers, which is based on the results of psychophysical experiments obtained for relevant image quality databases such as TID2008, LIVE, CSIQ and IVC. Furthermore, Hi-SET coder obtains better results both for compression ratios and perceptual image quality than the JPEG2000 coder and other coders that use a Hilbert Fractal for image compression. Hence, when the proposed perceptual quantization is introduced to Hi-SET coder, our compressor improves its numerical and perceptual e±ciency. When ½GBbBShift method applied to Hi-SET is compared against MaxShift method applied to the JPEG2000 standard and Hi-SET, the images coded by our ROI method get the best results when the overall image quality is estimated. Both the proposed perceptual quantization and the ½GBbBShift method are generalized algorithms that can be applied to other Wavelet based image compression algorithms such as JPEG2000, SPIHT or SPECK.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Xavier Otazu
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-938351-3-2	Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Mor2011			Serial	1786
Permanent link to this record



Author	Arnau Ramisa; David Aldavert; Shrihari Vasudevan; Ricardo Toledo; Ramon Lopez de Mantaras
Title	The IIIA30 MObile Robot Object Recognition Datset			Type	Conference Article
Year	2011	Publication	11th Portuguese Robotics Open	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Object perception is a key feature in order to make mobile robots able to perform high-level tasks. However, research aimed at addressing the constraints and limitations encountered in a mobile robotics scenario, like low image resolution, motion blur or tight computational constraints, is still very scarce. In order to facilitate future research in this direction, in this work we present an object detection and recognition dataset acquired using a mobile robotic platform. As a baseline for the dataset, we evaluated the cascade of weak classifiers object detection method from Viola and Jones.
Address	Lisboa
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	Robotica
Notes	RV;ADAS			Approved	no
Call Number	Admin @ si @ RAV2011			Serial	1777
Permanent link to this record



Author	Shida Beigpour; Joost Van de Weijer
Title	Object Recoloring Based on Intrinsic Image Estimation			Type	Conference Article
Year	2011	Publication	13th IEEE International Conference in Computer Vision	Abbreviated Journal
Volume		Issue		Pages	327 - 334
Keywords
Abstract	Object recoloring is one of the most popular photo-editing tasks. The problem of object recoloring is highly under-constrained, and existing recoloring methods limit their application to objects lit by a white illuminant. Application of these methods to real-world scenes lit by colored illuminants, multiple illuminants, or interreflections, results in unrealistic recoloring of objects. In this paper, we focus on the recoloring of single-colored objects presegmented from their background. The single-color constraint allows us to fit a more comprehensive physical model to the object. We demonstrate that this permits us to perform realistic recoloring of objects lit by non-white illuminants, and multiple illuminants. Moreover, the model allows for more realistic handling of illuminant alteration of the scene. Recoloring results captured by uncalibrated cameras demonstrate that the proposed framework obtains realistic recoloring for complex natural images. Furthermore we use the model to transfer color between objects and show that the results are more realistic than existing color transfer methods.
Address	Barcelona
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN	978-1-4577-1101-5	Medium
Area		Expedition		Conference	ICCV
Notes	CIC			Approved	no
Call Number	Admin @ si @ BeW2011			Serial	1781
Permanent link to this record



Author	Xavier Carrillo; E Fernandez-Nofrerias; Francesco Ciompi; Oriol Rodriguez-Leor; Petia Radeva; Neus Salvatella; Oriol Pujol; J. Mauri; A. Bayes
Title	Changes in Radial Artery Volume Assessed Using Intravascular Ultrasound: A Comparison of Two Vasodilator Regimens in Transradial Coronary Intervention			Type	Journal Article
Year	2011	Publication	Journal of Invasive Cardiology	Abbreviated Journal	JOIC
Volume	23	Issue	10	Pages	401-404
Keywords	radial; vasodilator treatment; percutaneous coronary intervention; IVUS; volumetric IVUS analysis
Abstract	OBJECTIVES: This study used intravascular ultrasound (IVUS) to evaluate radial artery volume changes after intraarterial administration of nitroglycerin and/or verapamil. BACKGROUND: Radial artery spasm, which is associated with radial artery size, is the main limitation of the transradial approach in percutaneous coronary interventions (PCI). METHODS: This prospective, randomized study compared the effect of two intra-arterial vasodilator regimens on radial artery volume: 0.2 mg of nitroglycerin plus 2.5 mg of verapamil (Group 1; n = 15) versus 2.5 mg of verapamil alone (Group 2; n = 15). Radial artery lumen volume was assessed using IVUS at two time points: at baseline (5 minutes after sheath insertion) and post-vasodilator (1 minute after drug administration). The luminal volume of the radial artery was computed using ECOC Random Fields (ECOC-RF), a technique used for automatic segmentation of luminal borders in longitudinal cut images from IVUS sequences. RESULTS: There was a significant increase in arterial lumen volume in both groups, with an increase from 451 ± 177 mm³ to 508 ± 192 mm³ (p = 0.001) in Group 1 and from 456 ± 188 mm³ to 509 ± 170 mm³ (p = 0.001) in Group 2. There were no significant differences between the groups in terms of absolute volume increase (58 mm³ versus 53 mm³, respectively; p = 0.65) or in relative volume increase (14% versus 20%, respectively; p = 0.69). CONCLUSIONS: Administration of nitroglycerin plus verapamil or verapamil alone to the radial artery resulted in similar increases in arterial lumen volume according to ECOC-RF IVUS measurements.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA			Approved	no
Call Number	Admin @ si @ CFC2011			Serial	1797
Permanent link to this record



Author	Carlo Gatta; Eloi Puertas; Oriol Pujol
Title	Multi-Scale Stacked Sequential Learning			Type	Journal Article
Year	2011	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	44	Issue	10-11	Pages	2414-2416
Keywords	Stacked sequential learning; Multiscale; Multiresolution; Contextual classification
Abstract	One of the most widely used assumptions in supervised learning is that data is independent and identically distributed. This assumption does not hold true in many real cases. Sequential learning is the discipline of machine learning that deals with dependent data such that neighboring examples exhibit some kind of relationship. In the literature, there are different approaches that try to capture and exploit this correlation, by means of different methodologies. In this paper we focus on meta-learning strategies and, in particular, the stacked sequential learning approach. The main contribution of this work is two-fold: first, we generalize the stacked sequential learning. This generalization reflects the key role of neighboring interactions modeling. Second, we propose an effective and efficient way of capturing and exploiting sequential correlations that takes into account long-range interactions by means of a multi-scale pyramidal decomposition of the predicted labels. Additionally, this new method subsumes the standard stacked sequential learning approach. We tested the proposed method on two different classification tasks: text lines classification in a FAQ data set and image classification. Results on these tasks clearly show that our approach outperforms the standard stacked sequential learning. Moreover, we show that the proposed method allows to control the trade-off between the detail and the desired range of the interactions.
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA			Approved	no
Call Number	Admin @ si @ GPP2011			Serial	1802
Permanent link to this record



Author	Patricia Marquez; Debora Gil; Aura Hernandez-Sabate
Title	A Confidence Measure for Assessing Optical Flow Accuracy in the Absence of Ground Truth			Type	Conference Article
Year	2011	Publication	IEEE International Conference on Computer Vision – Workshops	Abbreviated Journal
Volume		Issue		Pages	2042-2049
Keywords	IEEE International Conference on Computer Vision – Workshops
Abstract	Optical flow is a valuable tool for motion analysis in autonomous navigation systems. A reliable application requires determining the accuracy of the computed optical flow. This is a main challenge given the absence of ground truth in real world sequences. This paper introduces a measure of optical flow accuracy for Lucas-Kanade based flows in terms of the numerical stability of the data-term. We call this measure optical flow condition number. A statistical analysis over ground-truth data show a good statistical correlation between the condition number and optical flow error. Experiments on driving sequences illustrate its potential for autonomous navigation systems.
Address
Corporate Author				Thesis
Publisher	IEEE	Place of Publication	Barcelona (Spain)	Editor
Language	English	Summary Language	English	Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICCVW
Notes	IAM; ADAS			Approved	no
Call Number	IAM @ iam @ MGH2011			Serial	1682
Permanent link to this record



Author	Kaida Xiao; Sophie Wuerger; Chenyang Fu; Dimosthenis Karatzas
Title	Unique Hue Data for Colour Appearance Models. Part i: Loci of Unique Hues and Hue Uniformity			Type	Journal Article
Year	2011	Publication	Color Research & Application	Abbreviated Journal	CRA
Volume	36	Issue	5	Pages	316-323
Keywords	unique hues; colour appearance models; CIECAM02; hue uniformity
Abstract	Psychophysical experiments were conducted to assess unique hues on a CRT display for a large sample of colour-normal observers (n 1⁄4 185). These data were then used to evaluate the most commonly used colour appear- ance model, CIECAM02, by transforming the CIEXYZ tris- timulus values of the unique hues to the CIECAM02 colour appearance attributes, lightness, chroma and hue angle. We report two findings: (1) the hue angles derived from our unique hue data are inconsistent with the commonly used Natural Color System hues that are incorporated in the CIECAM02 model. We argue that our predicted unique hue angles (derived from our large dataset) provide a more reliable standard for colour management applications when the precise specification of these salient colours is im- portant. (2) We test hue uniformity for CIECAM02 in all four unique hues and show significant disagreements for all hues, except for unique red which seems to be invariant under lightness changes. Our dataset is useful to improve the CIECAM02 model as it provides reliable data for benchmarking.
Address
Corporate Author				Thesis
Publisher	Wiley Periodicals Inc	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ XWF2011			Serial	1816
Permanent link to this record



Author	Fahad Shahbaz Khan
Title	Coloring bag-of-words based image representations			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Put succinctly, the bag-of-words based image representation is the most successful approach for object and scene recognition. Within the bag-of-words framework the optimal fusion of multiple cues, such as shape, texture and color, still remains an active research domain. There exist two main approaches to combine color and shape information within the bag-of-words framework. The first approach called, early fusion, fuses color and shape at the feature level as a result of which a joint colorshape vocabulary is produced. The second approach, called late fusion, concatenates histogram representation of both color and shape, obtained independently. In the first part of this thesis, we analyze the theoretical implications of both early and late feature fusion. We demonstrate that both these approaches are suboptimal for a subset of object categories. Consequently, we propose a novel method for recognizing object categories when using multiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category specific color attention. Color is used to compute bottom-up and top-down attention maps. Subsequently, the color attention maps are used to modulate the weights of the shape features. Shape features are given more weight in regions with higher attention and vice versa. The approach is tested on several benchmark object recognition data sets and the results clearly demonstrate the effectiveness of our proposed method. In the second part of the thesis, we investigate the problem of obtaining compact spatial pyramid representations for object and scene recognition. Spatial pyramids have been successfully applied to incorporate spatial information into bag-of-words based image representation. However, a major drawback of spatial pyramids is that it leads to high dimensional image representations. We present a novel framework for obtaining compact pyramid representation. The approach reduces the size of a high dimensional pyramid representation upto an order of magnitude without any significant reduction in accuracy. Moreover, we also investigate the optimal combination of multiple features such as color and shape within the context of our compact pyramid representation. Finally, we describe a novel technique to build discriminative visual words from multiple cues learned independently from training images. To this end, we use an information theoretic vocabulary compression technique to find discriminative combinations of visual cues and the resulting visual vocabulary is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. The approach is tested on standard object recognition data sets. The results obtained clearly demonstrate the effectiveness of our approach.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Joost Van de Weijer;Maria Vanrell
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Kha2011			Serial	1838
Permanent link to this record



Author	Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez; Xavier Roca
Title	A Selective Spatio-Temporal Interest Point Detector for Human Action Recognition in Complex Scenes			Type	Conference Article
Year	2011	Publication	13th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	1776-1783
Keywords
Abstract	Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.
Address	Barcelona
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN	978-1-4577-1101-5	Medium
Area		Expedition		Conference	ICCV
Notes	ISE			Approved	no
Call Number	Admin @ si @ CHM2011			Serial	1811
Permanent link to this record