Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	46–60 of 157 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

List View

Citations

Details

	Records
	Author	Jordi Esquirol; Cristina Palmero; Vanessa Bayo; Miquel Angel Cos; Sergio Escalera; David Sanchez; Maider Sanchez; Noelia Serrano; Mireia Relats
	Title	Automatic RBG-depth-pressure anthropometric analysis and individualised sleep solution prescription			Type	Journal
	Year	2017	Publication	Journal of Medical Engineering & Technology	Abbreviated Journal	JMET
	Volume	41	Issue	6	Pages	486-497
	Keywords
	Abstract	INTRODUCTION: Sleep surfaces must adapt to individual somatotypic features to maintain a comfortable, convenient and healthy sleep, preventing diseases and injuries. Individually determining the most adequate rest surface can often be a complex and subjective question. OBJECTIVES: To design and validate an automatic multimodal somatotype determination model to automatically recommend an individually designed mattress-topper-pillow combination. METHODS: Design and validation of an automated prescription model for an individualised sleep system is performed through a single-image 2 D-3 D analysis and body pressure distribution, to objectively determine optimal individual sleep surfaces combining five different mattress densities, three different toppers and three cervical pillows. RESULTS: A final study (n = 151) and re-analysis (n = 117) defined and validated the model, showing high correlations between calculated and real data (>85% in height and body circumferences, 89.9% in weight, 80.4% in body mass index and more than 70% in morphotype categorisation). CONCLUSIONS: Somatotype determination model can accurately prescribe an individualised sleep solution. This can be useful for healthy people and for health centres that need to adapt sleep surfaces to people with special needs. Next steps will increase model's accuracy and analise, if this prescribed individualised sleep solution can improve sleep quantity and quality; additionally, future studies will adapt the model to mattresses with technological improvements, tailor-made production and will define interfaces for people with special needs.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ EPB2017			Serial	3010
Permanent link to this record



	Author	Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera; Huamin Ren; Thomas B. Moeslund; Elham Etemad
	Title	Locality Regularized Group Sparse Coding for Action Recognition			Type	Journal Article
	Year	2017	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	158	Issue		Pages	106-114
	Keywords	Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
	Abstract	Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ BGE2017			Serial	3014
Permanent link to this record



	Author	Meysam Madadi
	Title	Human Segmentation, Pose Estimation and Applications			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Automatic analyzing humans in photographs or videos has great potential applications in computer vision, including medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task. In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. nonrigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin. We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions. Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Sergio Escalera;Jordi Gonzalez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-3-2	Medium
	Area		Expedition		Conference
	Notes	HUPBA			Approved	no
	Call Number	Admin @ si @ Mad2017			Serial	3017
Permanent link to this record



	Author	Onur Ferhat
	Title	Analysis of Head-Pose Invariant, Natural Light Gaze Estimation Methods			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Eye tracker devices have traditionally been only used inside laboratories, requiring trained professionals and elaborate setup mechanisms. However, in the recent years the scientific work on easier–to–use eye trackers which require no special hardware—other than the omnipresent front facing cameras in computers, tablets, and mobiles—is aiming at making this technology common–place. These types of trackers have several extra challenges that make the problem harder, such as low resolution images provided by a regular webcam, the changing ambient lighting conditions, personal appearance differences, changes in head pose, and so on. Recent research in the field has focused on all these challenges in order to provide better gaze estimation performances in a real world setup. In this work, we aim at tackling the gaze tracking problem in a single camera setup. We first analyze all the previous work in the field, identifying the strengths and weaknesses of each tried idea. We start our work on the gaze tracker with an appearance–based gaze estimation method, which is the simplest idea that creates a direct mapping between a rectangular image patch extracted around the eye in a camera image, and the gaze point (or gaze direction). Here, we do an extensive analysis of the factors that affect the performance of this tracker in several experimental setups, in order to address these problems in future works. In the second part of our work, we propose a feature–based gaze estimation method, which encodes the eye region image into a compact representation. We argue that this type of representation is better suited to dealing with head pose and lighting condition changes, as it both reduces the dimensionality of the input (i.e. eye image) and breaks the direct connection between image pixel intensities and the gaze estimation. Lastly, we use a face alignment algorithm to have robust face pose estimation, using a 3D model customized to the subject using the tracker. We combine this with a convolutional neural network trained on a large dataset of images to build a face pose invariant gaze tracker.
	Address	September 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Fernando Vilariño
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-5-6	Medium
	Area		Expedition		Conference
	Notes	MV			Approved	no
	Call Number	Admin @ si @ Fer2017			Serial	3018
Permanent link to this record



	Author	Arash Akbarinia
	Title	Computational Model of Visual Perception: From Colour to Form			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The original idea of this project was to study the role of colour in the challenging task of object recognition. We started by extending previous research on colour naming showing that it is feasible to capture colour terms through parsimonious ellipsoids. Although, the results of our model exceeded state-of-the-art in two benchmark datasets, we realised that the two phenomena of metameric lights and colour constancy must be addressed prior to any further colour processing. Our investigation of metameric pairs reached the conclusion that they are infrequent in real world scenarios. Contrary to that, the illumination of a scene often changes dramatically. We addressed this issue by proposing a colour constancy model inspired by the dynamical centre-surround adaptation of neurons in the visual cortex. This was implemented through two overlapping asymmetric Gaussians whose variances and heights are adjusted according to the local contrast of pixels. We complemented this model with a generic contrast-variant pooling mechanism that inversely connect the percentage of pooled signal to the local contrast of a region. The results of our experiments on four benchmark datasets were indeed promising: the proposed model, although simple, outperformed even learning-based approaches in many cases. Encouraged by the success of our contrast-variant surround modulation, we extended this approach to detect boundaries of objects. We proposed an edge detection model based on the first derivative of the Gaussian kernel. We incorporated four types of surround: full, far, iso- and orthogonal-orientation. Furthermore, we accounted for the pooling mechanism at higher cortical areas and the shape feedback sent to lower areas. Our results in three benchmark datasets showed significant improvement over non-learning algorithms. To summarise, we demonstrated that biologically-inspired models offer promising solutions to computer vision problems, such as, colour naming, colour constancy and edge detection. We believe that the greatest contribution of this Ph.D dissertation is modelling the concept of dynamic surround modulation that shows the significance of contrast-variant surround integration. The models proposed here are grounded on only a portion of what we know about the human visual system. Therefore, it is only natural to complement them accordingly in future works.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	C. Alejandro Parraga
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-4-9	Medium
	Area		Expedition		Conference
	Notes	NEUROBIT			Approved	no
	Call Number	Admin @ si @ Akb2017			Serial	3019
Permanent link to this record



	Author	Cristhian Aguilera
	Title	Local feature description in cross-spectral imagery			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Over the last few years, the number of consumer computer vision applications has increased dramatically. Today, computer vision solutions can be found in video game consoles, smartphone applications, driving assistance – just to name a few. Ideally, we require the performance of those applications, particularly those that are safety critical to remain constant under any external environment factors, such as changes in illumination or weather conditions. However, this is not always possible or very difficult to obtain by only using visible imagery, due to the inherent limitations of the images from that spectral band. For that reason, the use of images from different or multiple spectral bands is becoming more appealing. The aforementioned possible advantages of using images from multiples spectral bands on various vision applications make multi-spectral image processing a relevant topic for research and development. Like in visible image processing, multi-spectral image processing needs tools and algorithms to handle information from various spectral bands. Furthermore, traditional tools such as local feature detection, which is the basis of many vision tasks such as visual odometry, image registration, or structure from motion, must be adjusted or reformulated to operate under new conditions. Traditional feature detection, description, and matching methods tend to underperform in multi-spectral settings, in comparison to mono-spectral settings, due to the natural differences between each spectral band. The work in this thesis is focused on the local feature description problem when cross-spectral images are considered. In this context, this dissertation has three main contributions. Firstly, the work starts by proposing the usage of a combination of frequency and spatial information, in a multi-scale scheme, as feature description. Evaluations of this proposal, based on classical hand-made feature descriptors, and comparisons with state of the art cross-spectral approaches help to find and understand limitations of such strategy. Secondly, different convolutional neural network (CNN) based architectures are evaluated when used to describe cross-spectral image patches. Results showed that CNN-based methods, designed to work with visible monocular images, could be successfully applied to the description of images from two different spectral bands, with just minor modifications. In this framework, a novel CNN-based network model, specifically intended to describe image patches from two different spectral bands, is proposed. This network, referred to as Q-Net, outperforms state of the art in the cross-spectral domain, including both previous hand-made solutions as well as L2 CNN-based architectures. The third contribution of this dissertation is in the cross-spectral feature description application domain. The multispectral odometry problem is tackled showing a real application of cross-spectral descriptors In addition to the three main contributions mentioned above, in this dissertation, two different multi-spectral datasets are generated and shared with the community to be used as benchmarks for further studies.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Angel Sappa
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-6-3	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Agu2017			Serial	3020
Permanent link to this record



	Author	Laura Igual; Santiago Segui
	Title	Introduction to Data Science – A Python Approach to Concepts, Techniques and Applications. Undergraduate Topics in Computer Science			Type	Book Whole
	Year	2017	Publication		Abbreviated Journal
	Volume		Issue		Pages	1-215
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	978-3-319-50016-4	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-319-50016-4	Medium
	Area		Expedition		Conference
	Notes	MILAB			Approved	no
	Call Number	Admin @ si @ IgS2017			Serial	3027
Permanent link to this record



	Author	Mireia Forns-Nadal; Federico Sem; Anna Mane; Laura Igual; Dani Guinart; Oscar Vilarroya
	Title	Increased Nucleus Accumbens Volume in First-Episode Psychosis			Type	Journal Article
	Year	2017	Publication	Psychiatry Research-Neuroimaging	Abbreviated Journal	PRN
	Volume	263	Issue		Pages	57-60
	Keywords
	Abstract	Nucleus accumbens has been reported as a key structure in the neurobiology of schizophrenia. Studies analyzing structural abnormalities have shown conflicting results, possibly related to confounding factors. We investigated the nucleus accumbens volume using manual delimitation in first-episode psychosis (FEP) controlling for age, cannabis use and medication. Thirty-one FEP subjects who were naive or minimally exposed to antipsychotics and a control group were MRI scanned and clinically assessed from baseline to 6 months of follow-up. FEP showed increased relative and total accumbens volumes. Clinical correlations with negative symptoms, duration of untreated psychosis and cannabis use were not significant.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no menciona			Approved	no
	Call Number	Admin @ si @ FSM2017			Serial	3028
Permanent link to this record



	Author	Fernando Vilariño
	Title	Bringing and keeping all the stakeholders together: creating a catalog of models of governance for innovation			Type	Miscellaneous
	Year	2017	Publication	Open Living Lab Days Report	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Krakow; August 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MV; no menciona;SIAI			Approved	no
	Call Number	Admin @ si @ Vil2017b			Serial	3033
Permanent link to this record



	Author	Mireia Sole; Joan Blanco; Debora Gil; G. Fonseka; Richard Frodsham; Oliver Valero; Francesca Vidal; Zaida Sarrate
	Title	Análisis 3d de la territorialidad cromosómica en células espermatogénicas: explorando la infertilidad desde un nuevo prisma			Type	Journal
	Year	2017	Publication	Revista Asociación para el Estudio de la Biología de la Reproducción	Abbreviated Journal	ASEBIR
	Volume	22	Issue	2	Pages	105
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM; 600.096; 600.145			Approved	no
	Call Number	Admin @ si @ SBG2017d			Serial	3042
Permanent link to this record



	Author	Laura Lopez-Fuentes; Claudio Rossi; Harald Skinnemoen
	Title	River segmentation for flood monitoring			Type	Conference Article
	Year	2017	Publication	Data Science for Emergency Management at Big Data 2017	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Floods are major natural disasters which cause deaths and material damages every year. Monitoring these events is crucial in order to reduce both the affected people and the economic losses. In this work we train and test three different Deep Learning segmentation algorithms to estimate the water area from river images, and compare their performances. We discuss the implementation of a novel data chain aimed to monitor river water levels by automatically process data collected from surveillance cameras, and to give alerts in case of high increases of the water level or flooding. We also create and openly publish the first image dataset for river water segmentation.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.084; 600.120			Approved	no
	Call Number	Admin @ si @ LRS2017			Serial	3078
Permanent link to this record



	Author	Lluis Pere de las Heras; Oriol Ramos Terrades; Josep Llados
	Title	Ontology-Based Understanding of Architectural Drawings			Type	Book Chapter
	Year	2017	Publication	International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	9657	Issue		Pages	75-85
	Keywords	Graphics recognition; Floor plan analysi; Domain ontology
	Abstract	In this paper we present a knowledge base of architectural documents aiming at improving existing methods of floor plan classification and understanding. It consists of an ontological definition of the domain and the inclusion of real instances coming from both, automatically interpreted and manually labeled documents. The knowledge base has proven to be an effective tool to structure our knowledge and to easily maintain and upgrade it. Moreover, it is an appropriate means to automatically check the consistency of relational data and a convenient complement of hard-coded knowledge interpretation systems.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ HRL2017			Serial	3086
Permanent link to this record



	Author	Ivet Rafegas
	Title	Color in Visual Recognition: from flat to deep representations and some biological parallelisms			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks. Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations. Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
	Address	November 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Maria Vanrell
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-7-0	Medium
	Area		Expedition		Conference
	Notes	CIC			Approved	no
	Call Number	Admin @ si @ Raf2017			Serial	3100
Permanent link to this record



	Author	C. Alejandro Parraga
	Title	Colours and Colour Vision: An Introductory Survey			Type	Journal Article
	Year	2017	Publication	Perception	Abbreviated Journal	PER
	Volume	46	Issue	5	Pages	640-641
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	NEUROBIT; no menciona			Approved	no
	Call Number	Par2017			Serial	3101
Permanent link to this record



	Author	Sounak Dey; Palaiahnakote Shivakumara; K.S. Raghunanda; Umapada Pal; Tong Lu; G. Hemantha Kumar; Chee Seng Chan
	Title	Script independent approach for multi-oriented text detection in scene image			Type	Journal Article
	Year	2017	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
	Volume	242	Issue		Pages	96-112
	Keywords
	Abstract	Developing a text detection method which is invariant to scripts in natural scene images is a challeng- ing task due to different geometrical structures of various scripts. Besides, multi-oriented of text lines in natural scene images make the problem more challenging. This paper proposes to explore ring radius transform (RRT) for text detection in multi-oriented and multi-script environments. The method finds component regions based on convex hull to generate radius matrices using RRT. It is a fact that RRT pro- vides low radius values for the pixels that are near to edges, constant radius values for the pixels that represent stroke width, and high radius values that represent holes created in background and convex hull because of the regular structures of text components. We apply k -means clustering on the radius matrices to group such spatially coherent regions into individual clusters. Then the proposed method studies the radius values of such cluster components that are close to the centroid and far from the cen- troid to detect text components. Furthermore, we have developed a Bangla dataset (named as ISI-UM dataset) and propose a semi-automatic system for generating its ground truth for text detection of arbi- trary orientations, which can be used by the researchers for text detection and recognition in the future. The ground truth will be released to public. Experimental results on our ISI-UM data and other standard datasets, namely, ICDAR 2013 scene, SVT and MSRA data, show that the proposed method outperforms the existing methods in terms of multi-lingual and multi-oriented text detection ability.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ DSR2017			Serial	3260
Permanent link to this record