Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	1–15 of 157 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

List View

Citations

Details

	Records
	Author	Hugo Jair Escalante; Victor Ponce; Sergio Escalera; Xavier Baro; Alicia Morales-Reyes; Jose Martinez-Carranza
	Title	Evolving weighting schemes for the Bag of Visual Words			Type	Journal Article
	Year	2017	Publication	Neural Computing and Applications	Abbreviated Journal	Neural Computing and Applications
	Volume	28	Issue	5	Pages	925–939
	Keywords	Bag of Visual Words; Bag of features; Genetic programming; Term-weighting schemes; Computer vision
	Abstract	The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g.,term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Springer
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA;MV; no menciona			Approved	no
	Call Number	Admin @ si @ EPE2017			Serial	2743
Permanent link to this record



	Author	Meysam Madadi
	Title	Human Segmentation, Pose Estimation and Applications			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Automatic analyzing humans in photographs or videos has great potential applications in computer vision, including medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task. In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. nonrigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin. We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions. Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Sergio Escalera;Jordi Gonzalez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-3-2	Medium
	Area		Expedition		Conference
	Notes	HUPBA			Approved	no
	Call Number	Admin @ si @ Mad2017			Serial	3017
Permanent link to this record



	Author	Pau Riba; Josep Llados; Alicia Fornes
	Title	Error-tolerant coarse-to-fine matching model for hierarchical graphs			Type	Conference Article
	Year	2017	Publication	11th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition	Abbreviated Journal
	Volume	10310	Issue		Pages	107-117
	Keywords	Graph matching; Hierarchical graph; Graph-based representation; Coarse-to-fine matching
	Abstract	Graph-based representations are effective tools to capture structural information from visual elements. However, retrieving a query graph from a large database of graphs implies a high computational complexity. Moreover, these representations are very sensitive to noise or small changes. In this work, a novel hierarchical graph representation is designed. Using graph clustering techniques adapted from graph-based social media analysis, we propose to generate a hierarchy able to deal with different levels of abstraction while keeping information about the topology. For the proposed representations, a coarse-to-fine matching method is defined. These approaches are validated using real scenarios such as classification of colour images and handwritten word spotting.
	Address	Anacapri; Italy; May 2017
	Corporate Author				Thesis
	Publisher	Springer International Publishing	Place of Publication		Editor	Pasquale Foggia; Cheng-Lin Liu; Mario Vento
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	GbRPR
	Notes	DAG; 600.097; 601.302; 600.121			Approved	no
	Call Number	Admin @ si @ RLF2017a			Serial	2951
Permanent link to this record



	Author	Ivet Rafegas
	Title	Color in Visual Recognition: from flat to deep representations and some biological parallelisms			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks. Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations. Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
	Address	November 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Maria Vanrell
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-7-0	Medium
	Area		Expedition		Conference
	Notes	CIC			Approved	no
	Call Number	Admin @ si @ Raf2017			Serial	3100
Permanent link to this record



	Author	Marçal Rusiñol; Josep Llados
	Title	Flowchart Recognition in Patent Information Retrieval			Type	Book Chapter
	Year	2017	Publication	Current Challenges in Patent Information Retrieval	Abbreviated Journal
	Volume	37	Issue		Pages	351-368
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	M. Lupu; K. Mayer; N. Kando; A.J. Trippe
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 600.121			Approved	no
	Call Number	Admin @ si @ RuL2017			Serial	2896
Permanent link to this record



	Author	Veronica Romero; Alicia Fornes; Enrique Vidal; Joan Andreu Sanchez
	Title	Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology			Type	Conference Article
	Year	2017	Publication	8th Iberian Conference on Pattern Recognition and Image Analysis	Abbreviated Journal
	Volume	10255	Issue		Pages	287-294
	Keywords	Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
	Abstract	Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
	Address	Faro; Portugal; June 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	L.A. Alexandre; J.Salvador Sanchez; Joao M. F. Rodriguez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-319-58837-7	Medium
	Area		Expedition		Conference	IbPRIA
	Notes	DAG; 602.006; 600.097; 600.121			Approved	no
	Call Number	Admin @ si @ RFV2017			Serial	2952
Permanent link to this record



	Author	Antonio Lopez; Jiaolong Xu; Jose Luis Gomez; David Vazquez; German Ros
	Title	From Virtual to Real World Visual Perception using Domain Adaptation -- The DPM as Example			Type	Book Chapter
	Year	2017	Publication	Domain Adaptation in Computer Vision Applications	Abbreviated Journal
	Volume		Issue	13	Pages	243-258
	Keywords	Domain Adaptation
	Abstract	Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter we revisit the DA of a deformable part-based model (DPM) as an exemplifying case of virtual- to-real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as: how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world.
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	Gabriela Csurka
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.085; 601.223; 600.076; 600.118			Approved	no
	Call Number	ADAS @ adas @ LXG2017			Serial	2872
Permanent link to this record



	Author	German Ros; Laura Sellart; Gabriel Villalonga; Elias Maidanik; Francisco Molero; Marc Garcia; Adriana Cedeño; Francisco Perez; Didier Ramirez; Eduardo Escobar; Jose Luis Gomez; David Vazquez; Antonio Lopez
	Title	Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA			Type	Book Chapter
	Year	2017	Publication	Domain Adaptation in Computer Vision Applications	Abbreviated Journal
	Volume	12	Issue		Pages	227-241
	Keywords	SYNTHIA; Virtual worlds; Autonomous Driving
	Abstract	Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	Gabriela Csurka
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.085; 600.082; 600.076; 600.118			Approved	no
	Call Number	ADAS @ adas @ RSV2017			Serial	2882
Permanent link to this record



	Author	Onur Ferhat
	Title	Analysis of Head-Pose Invariant, Natural Light Gaze Estimation Methods			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Eye tracker devices have traditionally been only used inside laboratories, requiring trained professionals and elaborate setup mechanisms. However, in the recent years the scientific work on easier–to–use eye trackers which require no special hardware—other than the omnipresent front facing cameras in computers, tablets, and mobiles—is aiming at making this technology common–place. These types of trackers have several extra challenges that make the problem harder, such as low resolution images provided by a regular webcam, the changing ambient lighting conditions, personal appearance differences, changes in head pose, and so on. Recent research in the field has focused on all these challenges in order to provide better gaze estimation performances in a real world setup. In this work, we aim at tackling the gaze tracking problem in a single camera setup. We first analyze all the previous work in the field, identifying the strengths and weaknesses of each tried idea. We start our work on the gaze tracker with an appearance–based gaze estimation method, which is the simplest idea that creates a direct mapping between a rectangular image patch extracted around the eye in a camera image, and the gaze point (or gaze direction). Here, we do an extensive analysis of the factors that affect the performance of this tracker in several experimental setups, in order to address these problems in future works. In the second part of our work, we propose a feature–based gaze estimation method, which encodes the eye region image into a compact representation. We argue that this type of representation is better suited to dealing with head pose and lighting condition changes, as it both reduces the dimensionality of the input (i.e. eye image) and breaks the direct connection between image pixel intensities and the gaze estimation. Lastly, we use a face alignment algorithm to have robust face pose estimation, using a 3D model customized to the subject using the tracker. We combine this with a convolutional neural network trained on a large dataset of images to build a face pose invariant gaze tracker.
	Address	September 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Fernando Vilariño
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-5-6	Medium
	Area		Expedition		Conference
	Notes	MV			Approved	no
	Call Number	Admin @ si @ Fer2017			Serial	3018
Permanent link to this record



	Author	Arash Akbarinia
	Title	Computational Model of Visual Perception: From Colour to Form			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The original idea of this project was to study the role of colour in the challenging task of object recognition. We started by extending previous research on colour naming showing that it is feasible to capture colour terms through parsimonious ellipsoids. Although, the results of our model exceeded state-of-the-art in two benchmark datasets, we realised that the two phenomena of metameric lights and colour constancy must be addressed prior to any further colour processing. Our investigation of metameric pairs reached the conclusion that they are infrequent in real world scenarios. Contrary to that, the illumination of a scene often changes dramatically. We addressed this issue by proposing a colour constancy model inspired by the dynamical centre-surround adaptation of neurons in the visual cortex. This was implemented through two overlapping asymmetric Gaussians whose variances and heights are adjusted according to the local contrast of pixels. We complemented this model with a generic contrast-variant pooling mechanism that inversely connect the percentage of pooled signal to the local contrast of a region. The results of our experiments on four benchmark datasets were indeed promising: the proposed model, although simple, outperformed even learning-based approaches in many cases. Encouraged by the success of our contrast-variant surround modulation, we extended this approach to detect boundaries of objects. We proposed an edge detection model based on the first derivative of the Gaussian kernel. We incorporated four types of surround: full, far, iso- and orthogonal-orientation. Furthermore, we accounted for the pooling mechanism at higher cortical areas and the shape feedback sent to lower areas. Our results in three benchmark datasets showed significant improvement over non-learning algorithms. To summarise, we demonstrated that biologically-inspired models offer promising solutions to computer vision problems, such as, colour naming, colour constancy and edge detection. We believe that the greatest contribution of this Ph.D dissertation is modelling the concept of dynamic surround modulation that shows the significance of contrast-variant surround integration. The models proposed here are grounded on only a portion of what we know about the human visual system. Therefore, it is only natural to complement them accordingly in future works.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	C. Alejandro Parraga
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-4-9	Medium
	Area		Expedition		Conference
	Notes	NEUROBIT			Approved	no
	Call Number	Admin @ si @ Akb2017			Serial	3019
Permanent link to this record



	Author	Pau Riba; Alicia Fornes; Josep Llados
	Title	Towards the Alignment of Handwritten Music Scores			Type	Book Chapter
	Year	2017	Publication	International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	9657	Issue		Pages	103-116
	Keywords	Optical Music Recognition; Handwritten Music Scores; Dynamic Time Warping alignment
	Abstract	It is very common to nd dierent versions of the same music work in archives of Opera Theaters. These dierences correspond to modications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such dierences. Given the diculties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the sta lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Bart Lamiroy; R Dueire Lins
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-319-52158-9	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 602.006; 600.121			Approved	no
	Call Number	Admin @ si @ RFL2017			Serial	2955
Permanent link to this record



	Author	Hana Jarraya; Muhammad Muzzamil Luqman; Jean-Yves Ramel
	Title	Improving Fuzzy Multilevel Graph Embedding Technique by Employing Topological Node Features: An Application to Graphics Recognition			Type	Book Chapter
	Year	2017	Publication	Graphics Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	9657	Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor	B. Lamiroy; R Dueire Lins
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	GREC
	Notes	DAG; 600.097; 600.121			Approved	no
	Call Number	Admin @ si @ JLR2017			Serial	2928
Permanent link to this record



	Author	Cristhian Aguilera
	Title	Local feature description in cross-spectral imagery			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Over the last few years, the number of consumer computer vision applications has increased dramatically. Today, computer vision solutions can be found in video game consoles, smartphone applications, driving assistance – just to name a few. Ideally, we require the performance of those applications, particularly those that are safety critical to remain constant under any external environment factors, such as changes in illumination or weather conditions. However, this is not always possible or very difficult to obtain by only using visible imagery, due to the inherent limitations of the images from that spectral band. For that reason, the use of images from different or multiple spectral bands is becoming more appealing. The aforementioned possible advantages of using images from multiples spectral bands on various vision applications make multi-spectral image processing a relevant topic for research and development. Like in visible image processing, multi-spectral image processing needs tools and algorithms to handle information from various spectral bands. Furthermore, traditional tools such as local feature detection, which is the basis of many vision tasks such as visual odometry, image registration, or structure from motion, must be adjusted or reformulated to operate under new conditions. Traditional feature detection, description, and matching methods tend to underperform in multi-spectral settings, in comparison to mono-spectral settings, due to the natural differences between each spectral band. The work in this thesis is focused on the local feature description problem when cross-spectral images are considered. In this context, this dissertation has three main contributions. Firstly, the work starts by proposing the usage of a combination of frequency and spatial information, in a multi-scale scheme, as feature description. Evaluations of this proposal, based on classical hand-made feature descriptors, and comparisons with state of the art cross-spectral approaches help to find and understand limitations of such strategy. Secondly, different convolutional neural network (CNN) based architectures are evaluated when used to describe cross-spectral image patches. Results showed that CNN-based methods, designed to work with visible monocular images, could be successfully applied to the description of images from two different spectral bands, with just minor modifications. In this framework, a novel CNN-based network model, specifically intended to describe image patches from two different spectral bands, is proposed. This network, referred to as Q-Net, outperforms state of the art in the cross-spectral domain, including both previous hand-made solutions as well as L2 CNN-based architectures. The third contribution of this dissertation is in the cross-spectral feature description application domain. The multispectral odometry problem is tackled showing a real application of cross-spectral descriptors In addition to the three main contributions mentioned above, in this dissertation, two different multi-spectral datasets are generated and shared with the community to be used as benchmarks for further studies.
	Address	October 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Angel Sappa
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-6-3	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Agu2017			Serial	3020
Permanent link to this record



	Author	Marc Bolaños; Mariella Dimiccoli; Petia Radeva
	Title	Towards Storytelling from Visual Lifelogging: An Overview			Type	Journal Article
	Year	2017	Publication	IEEE Transactions on Human-Machine Systems	Abbreviated Journal	THMS
	Volume	47	Issue	1	Pages	77 - 90
	Keywords
	Abstract	Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives, hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure and the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis, and in view of the current state of the art, indicates new lines of research to move us towards storytelling from visual lifelogging.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; 601.235			Approved	no
	Call Number	Admin @ si @ BDR2017			Serial	2712
Permanent link to this record



	Author	Mariella Dimiccoli; Marc Bolaños; Estefania Talavera; Maedeh Aghaei; Stavri G. Nikolov; Petia Radeva
	Title	SR-Clustering: Semantic Regularized Clustering for Egocentric Photo Streams Segmentation			Type	Journal Article
	Year	2017	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	155	Issue		Pages	55-69
	Keywords
	Abstract	While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of nearly 17,000 images, show that the proposed approach outperforms state-of-the-art methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; 601.235			Approved	no
	Call Number	Admin @ si @ DBT2017			Serial	2714
Permanent link to this record