Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	31–45 of 155 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

List View

Citations

Details

	Records
	Author	Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu
	Title	Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video			Type	Journal Article
	Year	2018	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	80	Issue		Pages	64-82
	Keywords	Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition
	Abstract	Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 600.121			Approved	no
	Call Number	Admin @ si @ RSJ2018			Serial	3096
Permanent link to this record



	Author	Lluis Gomez; Marçal Rusiñol; Ali Furkan Biten; Dimosthenis Karatzas
	Title	Subtitulació automàtica d'imatges. Estat de l'art i limitacions en el context arxivístic			Type	Conference Article
	Year	2018	Publication	Jornades Imatge i Recerca	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	JIR
	Notes	DAG; 600.084; 600.135; 601.338; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ GRB2018			Serial	3173
Permanent link to this record



	Author	Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas
	Title	Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters			Type	Conference Article
	Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
	Volume		Issue		Pages	97-102
	Keywords	Robust Reading; End-to-end Systems; CNN; Utility Meters
	Abstract	In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as dirt, illumination highlights or blur.
	Address	Viena; Austria; April 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG; 600.084; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ GRK2018			Serial	3102
Permanent link to this record



	Author	Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol; Anguelos Nicolaou
	Title	The Robust Reading Competition Annotation and Evaluation Platform			Type	Conference Article
	Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
	Volume		Issue		Pages	61-66
	Keywords
	Abstract	The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
	Address	Viena; Austria; April 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG; 600.084; 600.121			Approved	no
	Call Number	KGR2018			Serial	3103
Permanent link to this record



	Author	David Aldavert; Marçal Rusiñol
	Title	Manuscript text line detection and segmentation using second-order derivatives analysis			Type	Conference Article
	Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
	Volume		Issue		Pages	293 - 298
	Keywords	text line detection; text line segmentation; text region detection; second-order derivatives
	Abstract	In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
	Address	Viena; Austria; April 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG; 600.084; 600.129; 302.065; 600.121;ADAS			Approved	no
	Call Number	Admin @ si @ AlR2018a			Serial	3104
Permanent link to this record



	Author	David Aldavert; Marçal Rusiñol
	Title	Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting			Type	Conference Article
	Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
	Volume		Issue		Pages	223 - 228
	Keywords	Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information
	Abstract	Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
	Address	Viena; Austria; April 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG; 600.084; 600.129; 600.121;ADAS			Approved	no
	Call Number	Admin @ si @ AlR2018b			Serial	3105
Permanent link to this record



	Author	V. Poulain d'Andecy; Emmanuel Hartmann; Marçal Rusiñol
	Title	Field Extraction by hybrid incremental and a-priori structural templates			Type	Conference Article
	Year	2018	Publication	13th IAPR International Workshop on Document Analysis Systems	Abbreviated Journal
	Volume		Issue		Pages	251 - 256
	Keywords	Layout Analysis; information extraction; incremental learning
	Abstract	In this paper, we present an incremental framework for extracting information fields from administrative documents. First, we demonstrate some limits of the existing state-of-the-art methods such as the delay of the system efficiency. This is a concern in industrial context when we have only few samples of each document class. Based on this analysis, we propose a hybrid system combining incremental learning by means of itf-df statistics and a-priori generic models. We report in the experimental section our results obtained with a dataset of real invoices.
	Address	Viena; Austria; April 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	DAS
	Notes	DAG; 600.084; 600.129; 600.121			Approved	no
	Call Number	Admin @ si @ PHR2018			Serial	3106
Permanent link to this record



	Author	Fahad Shahbaz Khan; Joost Van de Weijer; Muhammad Anwer Rao; Andrew Bagdanov; Michael Felsberg; Jorma
	Title	Scale coding bag of deep features for human attribute and action recognition			Type	Journal Article
	Year	2018	Publication	Machine Vision and Applications	Abbreviated Journal	MVAP
	Volume	29	Issue	1	Pages	55-71
	Keywords	Action recognition; Attribute recognition; Bag of deep features
	Abstract	Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.068; 600.079; 600.106; 600.120;CIC			Approved	no
	Call Number	Admin @ si @ KWR2018			Serial	3107
Permanent link to this record



	Author	Felipe Codevilla; Matthias Muller; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy
	Title	End-to-end Driving via Conditional Imitation Learning			Type	Conference Article
	Year	2018	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
	Volume		Issue		Pages	4693 - 4700
	Keywords
	Abstract	Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL
	Address	Brisbane; Australia; May 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICRA
	Notes	ADAS; 600.116; 600.124; 600.118			Approved	no
	Call Number	Admin @ si @ CML2018			Serial	3108
Permanent link to this record



	Author	Marc Bolaños; Alvaro Peris; Francisco Casacuberta; Sergi Solera; Petia Radeva
	Title	Egocentric video description based on temporally-linked sequences			Type	Journal Article
	Year	2018	Publication	Journal of Visual Communication and Image Representation	Abbreviated Journal	JVCIR
	Volume	50	Issue		Pages	205-216
	Keywords	egocentric vision; video description; deep learning; multi-modal learning
	Abstract	Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures. In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1,339 events with 3,991 descriptions, from 55 days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ BPC2018			Serial	3109
Permanent link to this record



	Author	Stefan Schurischuster; Beatriz Remeseiro; Petia Radeva; Martin Kampel
	Title	A Preliminary Study of Image Analysis for Parasite Detection on Honey Bees			Type	Conference Article
	Year	2018	Publication	15th International Conference on Image Analysis and Recognition	Abbreviated Journal
	Volume	10882	Issue		Pages	465-473
	Keywords
	Abstract	Varroa destructor is a parasite harming bee colonies. As the worldwide bee population is in danger, beekeepers as well as researchers are looking for methods to monitor the health of bee hives. In this context, we present a preliminary study to detect parasites on bee videos by means of image analysis and machine learning techniques. For this purpose, each video frame is analyzed individually to extract bee image patches, which are then processed to compute image descriptors and finally classified into mite and no mite bees. The experimental results demonstrated the adequacy of the proposed method, which will be a perfect stepping stone for a further bee monitoring system.
	Address	Povoa de Varzim; Portugal; June 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICIAR
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ SRR2018a			Serial	3110
Permanent link to this record



	Author	Stefan Lonn; Petia Radeva; Mariella Dimiccoli
	Title	A picture is worth a thousand words but how to organize thousands of pictures?			Type	Miscellaneous
	Year	2018	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 10 persons. Experimental results demonstrate better user satisfaction with respect to state of the art solutions in terms of organization.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ LRD2018			Serial	3111
Permanent link to this record



	Author	Md. Mostafa Kamal Sarker; Hatem A. Rashwan; Farhan Akram; Syeda Furruka Banu; Adel Saleh; Vivek Kumar Singh; Forhad U. H. Chowdhury; Saddam Abdulwahab; Santiago Romani; Petia Radeva; Domenec Puig
	Title	SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks.			Type	Conference Article
	Year	2018	Publication	21st International Conference on Medical Image Computing & Computer Assisted Intervention	Abbreviated Journal
	Volume	2	Issue		Pages	21-29
	Keywords
	Abstract	Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we investigated a loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the melanoma regions with sharp boundaries. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of segmentation accuracy. Moreover, it is capable to segment more than 100 images of size 384x384 per second on a recent GPU.
	Address	Granada; Espanya; September 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	MICCAI
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ SRA2018			Serial	3112
Permanent link to this record



	Author	Md. Mostafa Kamal Sarker; Mohammed Jabreel; Hatem A. Rashwan; Syeda Furruka Banu; Petia Radeva; Domenec Puig
	Title	CuisineNet: Food Attributes Classification using Multi-scale Convolution Network			Type	Conference Article
	Year	2018	Publication	21st International Conference of the Catalan Association for Artificial Intelligence	Abbreviated Journal
	Volume		Issue		Pages	365-372
	Keywords
	Abstract	Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.
	Address	Roses; catalonia; October 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CCIA
	Notes	MILAB; no menciona			Approved	no
	Call Number	Admin @ si @ SJR2018			Serial	3113
Permanent link to this record



	Author	Ivet Rafegas; Maria Vanrell
	Title	Color encoding in biologically-inspired convolutional neural networks			Type	Journal Article
	Year	2018	Publication	Vision Research	Abbreviated Journal	VR
	Volume	151	Issue		Pages	7-17
	Keywords	Color coding; Computer vision; Deep learning; Convolutional neural networks
	Abstract	Convolutional Neural Networks have been proposed as suitable frameworks to model biological vision. Some of these artificial networks showed representational properties that rival primate performances in object recognition. In this paper we explore how color is encoded in a trained artificial network. It is performed by estimating a color selectivity index for each neuron, which allows us to describe the neuron activity to a color input stimuli. The index allows us to classify whether they are color selective or not and if they are of a single or double color. We have determined that all five convolutional layers of the network have a large number of color selective neurons. Color opponency clearly emerges in the first layer, presenting 4 main axes (Black-White, Red-Cyan, Blue-Yellow and Magenta-Green), but this is reduced and rotated as we go deeper into the network. In layer 2 we find a denser hue sampling of color neurons and opponency is reduced almost to one new main axis, the Bluish-Orangish coinciding with the dataset bias. In layers 3, 4 and 5 color neurons are similar amongst themselves, presenting different type of neurons that detect specific colored objects (e.g., orangish faces), specific surrounds (e.g., blue sky) or specific colored or contrasted object-surround configurations (e.g. blue blob in a green surround). Overall, our work concludes that color and shape representation are successively entangled through all the layers of the studied network, revealing certain parallelisms with the reported evidences in primate brains that can provide useful insight into intermediate hierarchical spatio-chromatic representations.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	CIC; 600.051; 600.087			Approved	no
	Call Number	Admin @ si @RaV2018			Serial	3114
Permanent link to this record