Publicacions CVC -- Query Results

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

Details

	Records
	Author	Sounak Dey; Palaiahnakote Shivakumara; K.S. Raghunanda; Umapada Pal; Tong Lu; G. Hemantha Kumar; Chee Seng Chan
	Title	Script independent approach for multi-oriented text detection in scene image			Type	Journal Article
	Year	2017	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
	Volume	242	Issue		Pages	96-112
	Keywords
	Abstract	Developing a text detection method which is invariant to scripts in natural scene images is a challeng- ing task due to different geometrical structures of various scripts. Besides, multi-oriented of text lines in natural scene images make the problem more challenging. This paper proposes to explore ring radius transform (RRT) for text detection in multi-oriented and multi-script environments. The method finds component regions based on convex hull to generate radius matrices using RRT. It is a fact that RRT pro- vides low radius values for the pixels that are near to edges, constant radius values for the pixels that represent stroke width, and high radius values that represent holes created in background and convex hull because of the regular structures of text components. We apply k -means clustering on the radius matrices to group such spatially coherent regions into individual clusters. Then the proposed method studies the radius values of such cluster components that are close to the centroid and far from the cen- troid to detect text components. Furthermore, we have developed a Bangla dataset (named as ISI-UM dataset) and propose a semi-automatic system for generating its ground truth for text detection of arbi- trary orientations, which can be used by the researchers for text detection and recognition in the future. The ground truth will be released to public. Experimental results on our ISI-UM data and other standard datasets, namely, ICDAR 2013 scene, SVT and MSRA data, show that the proposed method outperforms the existing methods in terms of multi-lingual and multi-oriented text detection ability.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ DSR2017			Serial	3260
Permanent link to this record



	Author	Anjan Dutta; Pau Riba; Josep Llados; Alicia Fornes
	Title	Hierarchical Stochastic Graphlet Embedding for Graph-based Pattern Recognition			Type	Journal Article
	Year	2020	Publication	Neural Computing and Applications	Abbreviated Journal	NEUCOMA
	Volume	32	Issue		Pages	11579–11596
	Keywords
	Abstract	Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable because of the lack of mathematical operations defined in graph domain. Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques. However, it is well known that graph embedding functions usually suffer from the loss of structural information. In this paper, we consider the hierarchical structure of a graph as a way to mitigate this loss of information. The hierarchical structure is constructed by topologically clustering the graph nodes and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure is constructed, we consider several configurations to define the mapping into a vector space given a classical graph embedding, in particular, we propose to make use of the stochastic graphlet embedding (SGE). Broadly speaking, SGE produces a distribution of uniformly sampled low-to-high-order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched by the SGE complements each other and includes important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, obtaining a more robust graph representation. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121; 600.141			Approved	no
	Call Number	Admin @ si @ DRL2020			Serial	3348
Permanent link to this record



	Author	Ernest Valveny; Oriol Ramos Terrades; Joan Mas; Marçal Rusiñol
	Title	Interactive Document Retrieval and Classification.			Type	Book Chapter
	Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
	Volume	48	Issue		Pages	17-30
	Keywords
	Abstract	In this chapter we describe a system for document retrieval and classification following the interactive-predictive framework. In particular, the system addresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents.
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	Angel Sappa; Jordi Vitria
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ VRM2013			Serial	2341
Permanent link to this record



	Author	Palaiahnakote Shivakumara; Anjan Dutta; Chew Lim Tan; Umapada Pal
	Title	Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing			Type	Journal Article
	Year	2014	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	72	Issue	1	Pages	515-539
	Keywords
	Abstract	In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
	Address
	Corporate Author				Thesis
	Publisher	Springer US	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1380-7501	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ SDT2014			Serial	2357
Permanent link to this record



	Author	Marçal Rusiñol; J. Chazalon; Katerine Diaz
	Title	Augmented Songbook: an Augmented Reality Educational Application for Raising Music Awareness			Type	Journal Article
	Year	2018	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	77	Issue	11	Pages	13773-13798
	Keywords	Augmented reality; Document image matching; Educational applications
	Abstract	This paper presents the development of an Augmented Reality mobile application which aims at sensibilizing young children to abstract concepts of music. Such concepts are, for instance, the musical notation or the idea of rhythm. Recent studies in Augmented Reality for education suggest that such technologies have multiple benefits for students, including younger ones. As mobile document image acquisition and processing gains maturity on mobile platforms, we explore how it is possible to build a markerless and real-time application to augment the physical documents with didactic animations and interactive virtual content. Given a standard image processing pipeline, we compare the performance of different local descriptors at two key stages of the process. Results suggest alternatives to the SIFT local descriptors, regarding result quality and computational efficiency, both for document model identification and perspective transform estimation. All experiments are performed on an original and public dataset we introduce here.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; ADAS; 600.084; 600.121; 600.118; 600.129			Approved	no
	Call Number	Admin @ si @ RCD2018			Serial	2996
Permanent link to this record



	Author	Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados
	Title	Classification of aesthetic natural scene images using statistical and semantic features			Type	Journal Article
	Year	2023	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	82	Issue	9	Pages	13507-13532
	Keywords
	Abstract	Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ BSP2023			Serial	3873
Permanent link to this record



	Author	Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas
	Title	Self-Supervised Learning from Web Data for Multimodal Retrieval			Type	Book Chapter
	Year	2019	Publication	Multi-Modal Scene Understanding Book	Abbreviated Journal
	Volume		Issue		Pages	279-306
	Keywords	self-supervised learning; webly supervised learning; text embeddings; multimodal retrieval; multimodal embedding
	Abstract	Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated text without supervision and analyze the semantic structure of the learnt joint image and text embeddingspace. Weperformathoroughanalysisandperformancecomparisonofﬁvedifferentstateof the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text basedimageretrievaltask,andweclearlyoutperformstateoftheartintheMIRFlickrdatasetwhen training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.129; 601.338; 601.310			Approved	no
	Call Number	Admin @ si @ GGG2019			Serial	3266
Permanent link to this record



	Author	L. Rothacker; Marçal Rusiñol; Josep Llados; G.A. Fink
	Title	A Two-stage Approach to Segmentation-Free Query-by-example Word Spotting			Type	Journal
	Year	2014	Publication	Manuscript Cultures	Abbreviated Journal
	Volume	7	Issue		Pages	47-58
	Keywords
	Abstract	With the ongoing progress in digitization, huge document collections and archives have become available to a broad audience. Scanned document images can be transmitted electronically and studied simultaneously throughout the world. While this is very beneficial, it is often impossible to perform automated searches on these document collections. Optical character recognition usually fails when it comes to handwritten or historic documents. In order to address the need for exploring document collections rapidly, researchers are working on word spotting. In query-by-example word spotting scenarios, the user selects an exemplary occurrence of the query word in a document image. The word spotting system then retrieves all regions in the collection that are visually similar to the given example of the query word. The best matching regions are presented to the user and no actual transcription is required. An important property of a word spotting system is the computational speed with which queries can be executed. In our previous work, we presented a relatively slow but high-precision method. In the present work, we will extend this baseline system to an integrated two-stage approach. In a coarse-grained first stage, we will filter document images efficiently in order to identify regions that are likely to contain the query word. In the fine-grained second stage, these regions will be analyzed with our previously presented high-precision method. Finally, we will report recognition results and query times for the well-known George Washington benchmark in our evaluation. We achieve state-of-the-art recognition results while the query times can be reduced to 50% in comparison with our baseline.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.061; 600.077			Approved	no
	Call Number	Admin @ si @			Serial	3190
Permanent link to this record



	Author	Josep Llados; Jaime Lopez-Krahe; Enric Marti
	Title	A system to understand hand-drawn floor plans using subgraph isomorphism and Hough transform			Type	Book Chapter
	Year	1997	Publication	Machine Vision and Applications	Abbreviated Journal
	Volume	10	Issue	3	Pages	150-158
	Keywords	Line drawings – Hough transform – Graph matching – CAD systems – Graphics recognition
	Abstract	Presently, man-machine interface development is a widespread research activity. A system to understand hand drawn architectural drawings in a CAD environment is presented in this paper. To understand a document, we have to identify its building elements and their structural properties. An attributed graph structure is chosen as a symbolic representation of the input document and the patterns to recognize in it. An inexact subgraph isomorphism procedure using relaxation labeling techniques is performed. In this paper we focus on how to speed up the matching. There is a building element, the walls, characterized by a hatching pattern. Using a straight line Hough transform (SLHT)-based method, we recognize this pattern, characterized by parallel straight lines, and remove from the input graph the edges belonging to this pattern. The isomorphism is then applied to the remainder of the input graph. When all the building elements have been recognized, the document is redrawn, correcting the inaccurate strokes obtained from a hand-drawn input.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG;IAM			Approved	no
	Call Number	IAM @ iam @ LLM1997a			Serial	1566
Permanent link to this record



	Author	Josep Llados; Enric Marti
	Title	A graph-edit algorithm for hand-drawn graphical document recognition and their automatic introduction into CAD systems.			Type	Miscellaneous
	Year	1999	Publication	Machine Graphics & Vision, 8(2):195–211.	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ LlM1999a			Serial	187
Permanent link to this record

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: