Publicacions CVC -- Query Results

[61–70] << 71 72 73 74 75 76 77 78 79 80 >> [81–90]

Details

Records
Author	Noha Elfiky; Fahad Shahbaz Khan; Joost Van de Weijer; Jordi Gonzalez
Title	Discriminative Compact Pyramids for Object and Scene Recognition			Type	Journal Article
Year	2012	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	45	Issue	4	Pages	1627-1636
Keywords
Abstract	Spatial pyramids have been successfully applied to incorporating spatial information into bag-of-words based image representation. However, a major drawback is that it leads to high dimensional image representations. In this paper, we present a novel framework for obtaining compact pyramid representation. First, we investigate the usage of the divisive information theoretic feature clustering (DITC) algorithm in creating a compact pyramid representation. In many cases this method allows us to reduce the size of a high dimensional pyramid representation up to an order of magnitude with little or no loss in accuracy. Furthermore, comparison to clustering based on agglomerative information bottleneck (AIB) shows that our method obtains superior results at significantly lower computational costs. Moreover, we investigate the optimal combination of multiple features in the context of our compact pyramid representation. Finally, experiments show that the method can obtain state-of-the-art results on several challenging data sets.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0031-3203	ISBN		Medium
Area		Expedition		Conference
Notes	ISE; CAT;CIC			Approved	no
Call Number	Admin @ si @ EKW2012			Serial	1807
Permanent link to this record



Author	Joan M. Nuñez
Title	Computer vision techniques for characterization of finger joints in X-ray image			Type	Report
Year	2011	Publication	CVC Technical Report	Abbreviated Journal
Volume	165	Issue		Pages
Keywords	Rheumatoid arthritis, X-ray, Sharp Van der Heijde, joint characterization, sclerosis detection, bone detection, edge, ridge
Abstract	Rheumatoid arthritis (RA) is an autoimmune inflammatory type of arthritis which mainly affects hands on its first stages. Though it is a chronic disease and there is no cure for it, treatments require an accurate assessment of illness evolution. Such assessment is based on evaluation of hand X-ray images by using one of the several available semi-quantitative methods. This task requires highly trained medical personnel. That is why the automation of the assessment would allow professionals to save time and effort. Two stages are involved in this task. Firstly, the joint detection, afterwards, the joint characterization. Unlike the little existing previous work, this contribution clearly separates those two stages and sets the foundations of a modular assessment system focusing on the characterization stage. A hand joint dataset is created and an accurate data analysis is achieved in order to identify relevant features. Since the sclerosis and the lower bone were decided to be the most important features, different computer vision techniques were used in order to develop a detector system for both of them. Joint space width measures are provided and their correlation with Sharp-Van der Heijde is verified
Address	Bellaterra (Barcelona)
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication		Editor	Dr. Fernando Vilariño and Dra. Debora Gil
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MV;IAM;			Approved	no
Call Number	IAM @ iam @ Nuñ2011			Serial	1795
Permanent link to this record



Author	Victor Ponce; Mario Gorga; Xavier Baro; Petia Radeva; Sergio Escalera
Title	Analisis de la Expresion Oral y Gestual en Proyectos Fin de Carrera Via un Sistema de Vision Artificial			Type	Miscellaneous
Year	2011	Publication	Revista electronica de la asociacion de enseñantes universitarios de la informatica AENUI	Abbreviated Journal	ReVision
Volume	4	Issue	1	Pages	8-18
Keywords
Abstract	La comunicación y expresión oral es una competencia de especial relevancia en el EEES. No obstante, en muchas enseñanzas superiores la puesta en práctica de esta competencia ha sido relegada principalmente a la presentación de proyectos fin de carrera. Dentro de un proyecto de innovación docente, se ha desarrollado una herramienta informática para la extracción de información objetiva para el análisis de la expresión oral y gestual de los alumnos. El objetivo es dar un “feedback” a los estudiantes que les permita mejorar la calidad de sus presentaciones. El prototipo inicial que se presenta en este trabajo permite extraer de forma automática información audiovisual y analizarla mediante técnicas de aprendizaje. El sistema ha sido aplicado a 15 proyectos fin de carrera y 15 exposiciones dentro de una asignatura de cuarto curso. Los resultados obtenidos muestran la viabilidad del sistema para sugerir factores que ayuden tanto en el éxito de la comunicación así como en los criterios de evaluación.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1989-1199	ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA;MV			Approved	no
Call Number	Admin @ si @ PGB2011c			Serial	1783
Permanent link to this record



Author	Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva
Title	Circular Blurred Shape Model for Multiclass Symbol Recognition			Type	Journal Article
Year	2011	Publication	IEEE Transactions on Systems, Man and Cybernetics (Part B) (IEEE)	Abbreviated Journal	TSMCB
Volume	41	Issue	2	Pages	497-506
Keywords
Abstract	In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1083-4419	ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; DAG;HuPBA			Approved	no
Call Number	Admin @ si @ EFP2011			Serial	1784
Permanent link to this record



Author	Javier Vazquez
Title	Colour Constancy in Natural Through Colour Naming and Sensor Sharpening			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Colour is derived from three physical properties: incident light, object reflectance and sensor sensitivities. Incident light varies under natural conditions; hence, recovering scene illuminant is an important issue in computational colour. One way to deal with this problem under calibrated conditions is by following three steps, 1) building a narrow-band sensor basis to accomplish the diagonal model, 2) building a feasible set of illuminants, and 3) defining criteria to select the best illuminant. In this work we focus on colour constancy for natural images by introducing perceptual criteria in the first and third stages. To deal with the illuminant selection step, we hypothesise that basic colour categories can be used as anchor categories to recover the best illuminant. These colour names are related to the way that the human visual system has evolved to encode relevant natural colour statistics. Therefore the recovered image provides the best representation of the scene labelled with the basic colour terms. We demonstrate with several experiments how this selection criterion achieves current state-of-art results in computational colour constancy. In addition to this result, we psychophysically prove that usual angular error used in colour constancy does not correlate with human preferences, and we propose a new perceptual colour constancy evaluation. The implementation of this selection criterion strongly relies on the use of a diagonal model for illuminant change. Consequently, the second contribution focuses on building an appropriate narrow-band sensor basis to represent natural images. We propose to use the spectral sharpening technique to compute a unique narrow-band basis optimised to represent a large set of natural reflectances under natural illuminants and given in the basis of human cones. The proposed sensors allow predicting unique hues and the World colour Survey data independently of the illuminant by using a compact singularity function. Additionally, we studied different families of sharp sensors to minimise different perceptual measures. This study brought us to extend the spherical sampling procedure from 3D to 6D. Several research lines still remain open. One natural extension would be to measure the effects of using the computed sharp sensors on the category hypothesis, while another might be to insert spatial contextual information to improve category hypothesis. Finally, much work still needs to be done to explore how individual sensors can be adjusted to the colours in a scene.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Maria Vanrell;Graham D. Finlayson
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Vaz2011a			Serial	1785
Permanent link to this record



Author	Jesus Jaime Moreno Escobar
Title	Perceptual Criteria on Image Compresions			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Nowadays, digital images are used in many areas in everyday life, but they tend to be big. This increases amount of information leads us to the problem of image data storage. For example, it is common to have a representation a color pixel as a 24-bit number, where the channels red, green, and blue employ 8 bits each. In consequence, this kind of color pixel can specify one of 224 ¼ 16:78 million colors. Therefore, an image at a resolution of 512 £ 512 that allocates 24 bits per pixel, occupies 786,432 bytes. That is why image compression is important. An important feature of image compression is that it can be lossy or lossless. A compressed image is acceptable provided these losses of image information are not perceived by the eye. It is possible to assume that a portion of this information is redundant. Lossless Image Compression is defined as to mathematically decode the same image which was encoded. In Lossy Image Compression needs to identify two features inside the image: the redundancy and the irrelevancy of information. Thus, lossy compression modifies the image data in such a way when they are encoded and decoded, the recovered image is similar enough to the original one. How similar is the recovered image in comparison to the original image is defined prior to the compression process, and it depends on the implementation to be performed. In lossy compression, current image compression schemes remove information considered irrelevant by using mathematical criteria. One of the problems of these schemes is that although the numerical quality of the compressed image is low, it shows a high visual image quality, e.g. it does not show a lot of visible artifacts. It is because these mathematical criteria, used to remove information, do not take into account if the viewed information is perceived by the Human Visual System. Therefore, the aim of an image compression scheme designed to obtain images that do not show artifacts although their numerical quality can be low, is to eliminate the information that is not visible by the Human Visual System. Hence, this Ph.D. thesis proposes to exploit the visual redundancy existing in an image by reducing those features that can be unperceivable for the Human Visual System. First, we define an image quality assessment, which is highly correlated with the psychophysical experiments performed by human observers. The proposed CwPSNR metrics weights the well-known PSNR by using a particular perceptual low level model of the Human Visual System, e.g. the Chromatic Induction Wavelet Model (CIWaM). Second, we propose an image compression algorithm (called Hi-SET), which exploits the high correlation and self-similarity of pixels in a given area or neighborhood by means of a fractal function. Hi-SET possesses the main features that modern image compressors have, that is, it is an embedded coder, which allows a progressive transmission. Third, we propose a perceptual quantizer (½SQ), which is a modification of the uniform scalar quantizer. The ½SQ is applied to a pixel set in a certain Wavelet sub-band, that is, a global quantization. Unlike this, the proposed modification allows to perform a local pixel-by-pixel forward and inverse quantization, introducing into this process a perceptual distortion which depends on the surround spatial information of the pixel. Combining ½SQ method with the Hi-SET image compressor, we define a perceptual image compressor, called ©SET. Finally, a coding method for Region of Interest areas is presented, ½GBbBShift, which perceptually weights pixels into these areas and maintains only the more important perceivable features in the rest of the image. Results presented in this report show that CwPSNR is the best-ranked image quality method when it is applied to the most common image compression distortions such as JPEG and JPEG2000. CwPSNR shows the best correlation with the judgement of human observers, which is based on the results of psychophysical experiments obtained for relevant image quality databases such as TID2008, LIVE, CSIQ and IVC. Furthermore, Hi-SET coder obtains better results both for compression ratios and perceptual image quality than the JPEG2000 coder and other coders that use a Hilbert Fractal for image compression. Hence, when the proposed perceptual quantization is introduced to Hi-SET coder, our compressor improves its numerical and perceptual e±ciency. When ½GBbBShift method applied to Hi-SET is compared against MaxShift method applied to the JPEG2000 standard and Hi-SET, the images coded by our ROI method get the best results when the overall image quality is estimated. Both the proposed perceptual quantization and the ½GBbBShift method are generalized algorithms that can be applied to other Wavelet based image compression algorithms such as JPEG2000, SPIHT or SPECK.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Xavier Otazu
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-938351-3-2	Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Mor2011			Serial	1786
Permanent link to this record



Author	Ferran Diego
Title	Probabilistic Alignment of Video Sequences Recorded by Moving Cameras			Type	Book Whole
Year	2011	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Video alignment consists of integrating multiple video sequences recorded independently into a single video sequence. This means to register both in time (synchronize frames) and space (image registration) so that the two videos sequences can be fused or compared pixel–wise. In spite of being relatively unknown, many applications today may benefit from the availability of robust and efficient video alignment methods. For instance, video surveillance requires to integrate video sequences that are recorded of the same scene at different times in order to detect changes. The problem of aligning videos has been addressed before, but in the relatively simple cases of fixed or rigidly attached cameras and simultaneous acquisition. In addition, most works rely on restrictive assumptions which reduce its difficulty such as linear time correspondence or the knowledge of the complete trajectories of corresponding scene points on the images; to some extent, these assumptions limit the practical applicability of the solutions developed until now. In this thesis, we focus on the challenging problem of aligning sequences recorded at different times from independent moving cameras following similar but not coincident trajectories. More precisely, this thesis covers four studies that advance the state-of-the-art in video alignment. First, we focus on analyzing and developing a probabilistic framework for video alignment, that is, a principled way to integrate multiple observations and prior information. In this way, two different approaches are presented to exploit the combination of several purely visual features (image–intensities, visual words and dense motion field descriptor), and global positioning system (GPS) information. Second, we focus on reformulating the problem into a single alignment framework since previous works on video alignment adopt a divide–and–conquer strategy, i.e., first solve the synchronization, and then register corresponding frames. This also generalizes the ’classic’ case of fixed geometric transform and linear time mapping. Third, we focus on exploiting directly the time domain of the video sequences in order to avoid exhaustive cross–frame search. This provides relevant information used for learning the temporal mapping between pairs of video sequences. Finally, we focus on adapting these methods to the on–line setting for road detection and vehicle geolocation. The qualitative and quantitative results presented in this thesis on a variety of real–world pairs of video sequences show that the proposed method is: robust to varying imaging conditions, different image content (e.g., incoming and outgoing vehicles), variations on camera velocity, and different scenarios (indoor and outdoor) going beyond the state–of–the–art. Moreover, the on–line video alignment has been successfully applied for road detection and vehicle geolocation achieving promising results.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joan Serrat
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	Admin @ si @ Die2011			Serial	1787
Permanent link to this record



Author	Sergio Escalera
Title	Multi-Modal Human Behaviour Analysis from Visual Data Sources			Type	Journal
Year	2013	Publication	ERCIM News journal	Abbreviated Journal	ERCIM
Volume	95	Issue		Pages	21-22
Keywords
Abstract	The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0926-4981	ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA;MILAB			Approved	no
Call Number	Admin @ si @ Esc2013			Serial	2361
Permanent link to this record



Author	Xavier Carrillo; E Fernandez-Nofrerias; Francesco Ciompi; O. Rodriguez-Leor; Petia Radeva; Neus Salvatella; Oriol Pujol; J. Mauri; A. Bayes
Title	Changes in Radial Artery Volume Assessed Using Intravascular Ultrasound: A Comparison of Two Vasodilator Regimens in Transradial Coronary Intervention			Type	Journal Article
Year	2011	Publication	Journal of Invasive Cardiology	Abbreviated Journal	JOIC
Volume	23	Issue	10	Pages	401-404
Keywords	radial; vasodilator treatment; percutaneous coronary intervention; IVUS; volumetric IVUS analysis
Abstract	OBJECTIVES: This study used intravascular ultrasound (IVUS) to evaluate radial artery volume changes after intraarterial administration of nitroglycerin and/or verapamil. BACKGROUND: Radial artery spasm, which is associated with radial artery size, is the main limitation of the transradial approach in percutaneous coronary interventions (PCI). METHODS: This prospective, randomized study compared the effect of two intra-arterial vasodilator regimens on radial artery volume: 0.2 mg of nitroglycerin plus 2.5 mg of verapamil (Group 1; n = 15) versus 2.5 mg of verapamil alone (Group 2; n = 15). Radial artery lumen volume was assessed using IVUS at two time points: at baseline (5 minutes after sheath insertion) and post-vasodilator (1 minute after drug administration). The luminal volume of the radial artery was computed using ECOC Random Fields (ECOC-RF), a technique used for automatic segmentation of luminal borders in longitudinal cut images from IVUS sequences. RESULTS: There was a significant increase in arterial lumen volume in both groups, with an increase from 451 ± 177 mm³ to 508 ± 192 mm³ (p = 0.001) in Group 1 and from 456 ± 188 mm³ to 509 ± 170 mm³ (p = 0.001) in Group 2. There were no significant differences between the groups in terms of absolute volume increase (58 mm³ versus 53 mm³, respectively; p = 0.65) or in relative volume increase (14% versus 20%, respectively; p = 0.69). CONCLUSIONS: Administration of nitroglycerin plus verapamil or verapamil alone to the radial artery resulted in similar increases in arterial lumen volume according to ECOC-RF IVUS measurements.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA			Approved	no
Call Number	Admin @ si @ CFC2011			Serial	1797
Permanent link to this record



Author	Miguel Angel Bautista; Sergio Escalera; Xavier Baro; Petia Radeva; Jordi Vitria; Oriol Pujol
Title	Minimal Design of Error-Correcting Output Codes			Type	Journal Article
Year	2011	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	33	Issue	6	Pages	693-702
Keywords	Multi-class classification; Error-correcting output codes; Ensemble of classifiers
Abstract	IF JCR CCIA 1.303 2009 54/103 The classification of large number of object categories is a challenging trend in the pattern recognition field. In literature, this is often addressed using an ensemble of classifiers. In this scope, the Error-correcting output codes framework has demonstrated to be a powerful tool for combining classifiers. However, most state-of-the-art ECOC approaches use a linear or exponential number of classifiers, making the discrimination of a large number of classes unfeasible. In this paper, we explore and propose a minimal design of ECOC in terms of the number of classifiers. Evolutionary computation is used for tuning the parameters of the classifiers and looking for the best minimal ECOC code configuration. The results over several public UCI datasets and different multi-class computer vision problems show that the proposed methodology obtains comparable (even better) results than state-of-the-art ECOC methodologies with far less number of dichotomizers.
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0167-8655	ISBN		Medium
Area		Expedition		Conference
Notes	MILAB; OR;HuPBA;MV			Approved	no
Call Number	Admin @ si @ BEB2011a			Serial	1800
Permanent link to this record



Author	Sergio Escalera; Xavier Baro; Oriol Pujol; Jordi Vitria; Petia Radeva
Title	Traffic-Sign Recognition Systems			Type	Book Whole
Year	2011	Publication	SpringerBriefs in Computer Science	Abbreviated Journal
Volume		Issue		Pages	5-13
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher	Springer London	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-4471-2244-9	Medium
Area		Expedition		Conference
Notes	MILAB; OR;HuPBA;MV			Approved	no
Call Number	Admin @ si @ EBP2011			Serial	1801
Permanent link to this record



Author	Carlo Gatta; Eloi Puertas; Oriol Pujol
Title	Multi-Scale Stacked Sequential Learning			Type	Journal Article
Year	2011	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	44	Issue	10-11	Pages	2414-2416
Keywords	Stacked sequential learning; Multiscale; Multiresolution; Contextual classification
Abstract	One of the most widely used assumptions in supervised learning is that data is independent and identically distributed. This assumption does not hold true in many real cases. Sequential learning is the discipline of machine learning that deals with dependent data such that neighboring examples exhibit some kind of relationship. In the literature, there are different approaches that try to capture and exploit this correlation, by means of different methodologies. In this paper we focus on meta-learning strategies and, in particular, the stacked sequential learning approach. The main contribution of this work is two-fold: first, we generalize the stacked sequential learning. This generalization reflects the key role of neighboring interactions modeling. Second, we propose an effective and efficient way of capturing and exploiting sequential correlations that takes into account long-range interactions by means of a multi-scale pyramidal decomposition of the predicted labels. Additionally, this new method subsumes the standard stacked sequential learning approach. We tested the proposed method on two different classification tasks: text lines classification in a FAQ data set and image classification. Results on these tasks clearly show that our approach outperforms the standard stacked sequential learning. Moreover, we show that the proposed method allows to control the trade-off between the detail and the desired range of the interactions.
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA			Approved	no
Call Number	Admin @ si @ GPP2011			Serial	1802
Permanent link to this record



Author	Palaiahnakote Shivakumara; Anjan Dutta; Chew Lim Tan; Umapada Pal
Title	Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing			Type	Journal Article
Year	2014	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
Volume	72	Issue	1	Pages	515-539
Keywords
Abstract	In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
Address
Corporate Author				Thesis
Publisher	Springer US	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1380-7501	ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ SDT2014			Serial	2357
Permanent link to this record



Author	Ivan Huerta; Ariel Amato; Xavier Roca; Jordi Gonzalez
Title	Exploiting Multiple Cues in Motion Segmentation Based on Background Subtraction			Type	Journal Article
Year	2013	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
Volume	100	Issue		Pages	183–196
Keywords	Motion segmentation; Shadow suppression; Colour segmentation; Edge segmentation; Ghost detection; Background subtraction
Abstract	This paper presents a novel algorithm for mobile-object segmentation from static background scenes, which is both robust and accurate under most of the common problems found in motionsegmentation. In our first contribution, a case analysis of motionsegmentation errors is presented taking into account the inaccuracies associated with different cues, namely colour, edge and intensity. Our second contribution is an hybrid architecture which copes with the main issues observed in the case analysis by fusing the knowledge from the aforementioned three cues and a temporal difference algorithm. On one hand, we enhance the colour and edge models to solve not only global and local illumination changes (i.e. shadows and highlights) but also the camouflage in intensity. In addition, local information is also exploited to solve the camouflage in chroma. On the other hand, the intensity cue is applied when colour and edge cues are not available because their values are beyond the dynamic range. Additionally, temporal difference scheme is included to segment motion where those three cues cannot be reliably computed, for example in those background regions not visible during the training period. Lastly, our approach is extended for handling ghost detection. The proposed method obtains very accurate and robust motionsegmentation results in multiple indoor and outdoor scenarios, while outperforming the most-referred state-of-art approaches.
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ HAR2013			Serial	1808
Permanent link to this record



Author	Bhaskar Chakraborty; Andrew Bagdanov; Jordi Gonzalez; Xavier Roca
Title	Human Action Recognition Using an Ensemble of Body-Part Detectors			Type	Journal Article
Year	2013	Publication	Expert Systems	Abbreviated Journal	EXSY
Volume	30	Issue	2	Pages	101-114
Keywords	Human action recognition;body-part detection;hidden Markov model
Abstract	This paper describes an approach to human action recognition based on a probabilistic optimization model of body parts using hidden Markov model (HMM). Our method is able to distinguish between similar actions by only considering the body parts having major contribution to the actions, for example, legs for walking, jogging and running; arms for boxing, waving and clapping. We apply HMMs to model the stochastic movement of the body parts for action recognition. The HMM construction uses an ensemble of body-part detectors, followed by grouping of part detections, to perform human identification. Three example-based body-part detectors are trained to detect three components of the human body: the head, legs and arms. These detectors cope with viewpoint changes and self-occlusions through the use of ten sub-classifiers that detect body parts over a specific range of viewpoints. Each sub-classifier is a support vector machine trained on features selected for the discriminative power for each particular part/viewpoint combination. Grouping of these detections is performed using a simple geometric constraint model that yields a viewpoint-invariant human detector. We test our approach on three publicly available action datasets: the KTH dataset, Weizmann dataset and HumanEva dataset. Our results illustrate that with a simple and compact representation we can achieve robust recognition of human actions comparable to the most complex, state-of-the-art methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ CBG2013			Serial	1809
Permanent link to this record