Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	526–540 of 3413 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

List View

Citations

Details

	Records
	Author	Ivet Rafegas
	Title	Color in Visual Recognition: from flat to deep representations and some biological parallelisms			Type	Book Whole
	Year	2017	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks. Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations. Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
	Address	November 2017
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Maria Vanrell
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-7-0	Medium
	Area		Expedition		Conference
	Notes	CIC			Approved	no
	Call Number	Admin @ si @ Raf2017			Serial	3100
Permanent link to this record



	Author	Cesar de Souza
	Title	Action Recognition in Videos: Data-efficient approaches for supervised learning of human action classification models for video			Type	Book Whole
	Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this dissertation, we explore different ways to perform human action recognition in video clips. We focus on data efficiency, proposing new approaches that alleviate the need for laborious and time-consuming manual data annotation. In the first part of this dissertation, we start by analyzing previous state-of-the-art models, comparing their differences and similarities in order to pinpoint where their real strengths come from. Leveraging this information, we then proceed to boost the classification accuracy of shallow models to levels that rival deep neural networks. We introduce hybrid video classification architectures based on carefully designed unsupervised representations of handcrafted spatiotemporal features classified by supervised deep networks. We show in our experiments that our hybrid model combine the best of both worlds: it is data efficient (trained on 150 to 10,000 short clips) and yet improved significantly on the state of the art, including deep models trained on millions of manually labeled images and videos. In the second part of this research, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We then introduce deep multi-task representation learning architectures to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
	Address	April 2018
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;Naila Murray
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Sou2018			Serial	3127
Permanent link to this record



	Author	Suman Ghosh
	Title	Word Spotting and Recognition in Images from Heterogeneous Sources A			Type	Book Whole
	Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
	Address	November 2018
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Ernest Valveny
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-0-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Gho2018			Serial	3217
Permanent link to this record



	Author	Pau Rodriguez
	Title	Towards Robust Neural Models for Fine-Grained Image Recognition			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Fine-grained recognition, i.e. identifying similar subcategories of the same superclass, is central to human activity. Recognizing a friend, finding bacteria in microscopic imagery, or discovering a new kind of galaxy, are just but few examples. However, fine-grained image recognition is still a challenging computer vision task since the differences between two images of the same category can overwhelm the differences between two images of different fine-grained categories. In this regime, where the difference between two categories resides on subtle input changes, excessively invariant CNNs discard those details that help to discriminate between categories and focus on more obvious changes, yielding poor classification performance. On the other hand, CNNs with too much capacity tend to memorize instance-specific details, thus causing overfitting. In this thesis,motivated by the potential impact of automatic fine-grained image recognition, we tackle the previous challenges and demonstrate that proper alignment of the inputs, multiple levels of attention, regularization, and explicitmodeling of the output space, results inmore accurate fine-grained recognitionmodels, that generalize better, and are more robust to intra-class variation. Concretely, we study the different stages of the neural network pipeline: input pre-processing, attention to regions, feature activations, and the label space. In each stage, we address different issues that hinder the recognition performance on various fine-grained tasks, and devise solutions in each chapter: i)We deal with the sensitivity to input alignment on fine-grained human facial motion such as pain. ii) We introduce an attention mechanism to allow CNNs to choose and process in detail the most discriminate regions of the image. iii)We further extend attention mechanisms to act on the network activations, thus allowing them to correct their predictions by looking back at certain regions, at different levels of abstraction. iv) We propose a regularization loss to prevent high-capacity neural networks to memorize instance details by means of almost-identical feature detectors. v)We finally study the advantages of explicitly modeling the output space within the error-correcting framework. As a result, in this thesis we demonstrate that attention and regularization seem promising directions to overcome the problems of fine-grained image recognition, as well as proper treatment of the input and the output space.
	Address	March 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Jordi Gonzalez;Josep M. Gonfaus;Xavier Roca
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-3-5	Medium
	Area		Expedition		Conference
	Notes	ISE; 600.119			Approved	no
	Call Number	Admin @ si @ Rod2019			Serial	3258
Permanent link to this record



	Author	Xim Cerda-Company
	Title	Understanding color vision: from psychophysics to computational modeling			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this PhD we have approached the human color vision from two different points of view: psychophysics and computational modeling. First, we have evaluated 15 different tone-mapping operators (TMOs). We have conducted two experiments that consider two different criteria: the first one evaluates the local relationships among intensity levels and the second one evaluates the global appearance of the tonemapped imagesw.r.t. the physical one (presented side by side). We conclude that the rankings depend on the criterion and they are not correlated. Considering both criteria, the best TMOs are KimKautz (Kim and Kautz, 2008) and Krawczyk (Krawczyk, Myszkowski, and Seidel, 2005). Another conclusion is that a more standardized evaluation criteria is needed to do a fair comparison among TMOs. Secondly, we have conducted several psychophysical experiments to study the color induction. We have studied two different properties of the visual stimuli: temporal frequency and luminance spatial distribution. To study the temporal frequency we defined equiluminant stimuli composed by both uniform and striped surrounds and we flashed them varying the flash duration. For uniform surrounds, the results show that color induction depends on both the flash duration and inducer’s chromaticity. As expected, in all chromatic conditions color contrast was induced. In contrast, for striped surrounds, we expected to induce color assimilation, but we observed color contrast or no induction. Since similar but not equiluminant striped stimuli induce color assimilation, we concluded that luminance differences could be a key factor to induce color assimilation. Thus, in a subsequent study, we have studied the luminance differences’ effect on color assimilation. We varied the luminance difference between the target region and its inducers and we observed that color assimilation depends on both this difference and the inducer’s chromaticity. For red-green condition (where the first inducer is red and the second one is green), color assimilation occurs in almost all luminance conditions. Instead, for green-red condition, color assimilation never occurs. Purple-lime and lime-purple chromatic conditions show that luminance difference is a key factor to induce color assimilation. When the target is darker than its surround, color assimilation is stronger in purple-lime, while when the target is brighter, color assimilation is stronger in lime-purple (’mirroring’ effect). Moreover, we evaluated whether color assimilation is due to luminance or brightness differences. Similarly to equiluminance condition, when the stimuli are equibrightness no color assimilation is induced. Our results support the hypothesis that mutual-inhibition plays a major role in color perception, or at least in color induction. Finally, we have defined a new firing rate model of color processing in the V1 parvocellular pathway. We have modeled two different layers of this cortical area: layers 4Cb and 2/3. Our model is a recurrent dynamic computational model that considers both excitatory and inhibitory cells and their lateral connections. Moreover, it considers the existent laminar differences and the cells’ variety. Thus, we have modeled both single- and double-opponent simple cells and complex cells, which are a pool of double-opponent simple cells. A set of sinusoidal drifting gratings have been used to test the architecture. In these gratings we have varied several spatial properties such as temporal and spatial frequencies, grating’s area and orientation. To reproduce the electrophysiological observations, the architecture has to consider the existence of non-oriented double-opponent cells in layer 4Cb and the lack of lateral connections between single-opponent cells. Moreover, we have tested our lateral connections simulating the center-surround modulation and we have reproduced physiological measurements where for high contrast stimulus, the result of the lateral connections is inhibitory, while it is facilitatory for low contrast stimulus.
	Address	March 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Xavier Otazu
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-4-2	Medium
	Area		Expedition		Conference
	Notes	NEUROBIT			Approved	no
	Call Number	Admin @ si @ Cer2019			Serial	3259
Permanent link to this record



	Author	Felipe Codevilla
	Title	On Building End-to-End Driving Models Through Imitation Learning			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Autonomous vehicles are now considered as an assured asset in the future. Literally, all the relevant car-markers are now in a race to produce fully autonomous vehicles. These car-makers usually make use of modular pipelines for designing autonomous vehicles. This strategy decomposes the problem in a variety of tasks such as object detection and recognition, semantic and instance segmentation, depth estimation, SLAM and place recognition, as well as planning and control. Each module requires a separate set of expert algorithms, which are costly specially in the amount of human labor and necessity of data labelling. An alternative, that recently has driven considerable interest, is the end-to-end driving. In the end-to-end driving paradigm, perception and control are learned simultaneously using a deep network. These sensorimotor models are typically obtained by imitation learning fromhuman demonstrations. The main advantage is that this approach can directly learn from large fleets of human-driven vehicles without requiring a fixed ontology and extensive amounts of labeling. However, scaling end-to-end driving methods to behaviors more complex than simple lane keeping or lead vehicle following remains an open problem. On this thesis, in order to achieve more complex behaviours, we address some issues when creating end-to-end driving system through imitation learning. The first of themis a necessity of an environment for algorithm evaluation and collection of driving demonstrations. On this matter, we participated on the creation of the CARLA simulator, an open source platformbuilt from ground up for autonomous driving validation and prototyping. Since the end-to-end approach is purely reactive, there is also the necessity to provide an interface with a global planning system. With this, we propose the conditional imitation learning that conditions the actions produced into some high level command. Evaluation is also a concern and is commonly performed by comparing the end-to-end network output to some pre-collected driving dataset. We show that this is surprisingly weakly correlated to the actual driving and propose strategies on how to better acquire data and a better comparison strategy. Finally, we confirmwell-known generalization issues (due to dataset bias and overfitting), new ones (due to dynamic objects and the lack of a causal model), and training instability; problems requiring further research before end-to-end driving through imitation can scale to real-world driving.
	Address	May 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Cod2019			Serial	3387
Permanent link to this record



	Author	Zhijie Fang
	Title	Behavior understanding of vulnerable road users by 2D pose estimation			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists can be critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, therefore, should be taken into account by systems providing any level of driving assistance, i.e. from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this PhD work, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow the established traffic codes to indicate future left/right turns and stop maneuvers with arm signals. In the case of pedestrians, no indications can be assumed a priori. Instead, we hypothesize that the walking pattern of a pedestrian can allow us to determine if he/she has the intention of crossing the road in the path of the egovehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this PhD work, we show how the same methodology can be used for recognizing pedestrians and cyclists’ intentions. For pedestrians, we perform experiments on the publicly available Daimler and JAAD datasets. For cyclists, we did not found an analogous dataset, therefore, we created our own one by acquiring and annotating corresponding video-sequences which we aim to share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.
	Address	May 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;David Vazquez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-6-6	Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ Fan2019			Serial	3388
Permanent link to this record



	Author	Juan Ignacio Toledo
	Title	Information Extraction from Heterogeneous Handwritten Documents			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this thesis we explore information Extraction from totally or partially handwritten documents. Basically we are dealing with two different application scenarios. The first scenario are modern highly structured documents like forms. In this kind of documents, the semantic information is encoded in different fields with a pre-defined location in the document, therefore, information extraction becomes roughly equivalent to transcription. The second application scenario are loosely structured totally handwritten documents, besides transcribing them, we need to assign a semantic label, from a set of known values to the handwritten words. In both scenarios, transcription is an important part of the information extraction. For that reason in this thesis we present two methods based on Neural Networks, to transcribe handwritten text.In order to tackle the challenge of loosely structured documents, we have produced a benchmark, consisting of a dataset, a defined set of tasks and a metric, that was presented to the community as an international competition. Also, we propose different models based on Convolutional and Recurrent neural networks that are able to transcribe and assign different semantic labels to each handwritten words, that is, able to perform Information Extraction.
	Address	July 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Alicia Fornes;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-7-3	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ Tol2019			Serial	3389
Permanent link to this record



	Author	David Berga
	Title	Understanding Eye Movements: Psychophysics and a Model of Primary Visual Cortex			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Humansmove their eyes in order to learn visual representations of the world. These eye movements depend on distinct factors, either by the scene that we perceive or by our own decisions. To select what is relevant to attend is part of our survival mechanisms and the way we build reality, as we constantly react both consciously and unconsciously to all the stimuli that is projected into our eyes. In this thesis we try to explain (1) how we move our eyes, (2) how to build machines that understand visual information and deploy eyemovements, and (3) how to make these machines understand tasks in order to decide for eye movements. (1) We provided the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of 230 synthetically-generated image patterns. A total of 15 types of stimuli has been generated (e.g. orientation, brightness, color, size, etc.), with 7 feature contrasts for each feature category. Eye-tracking data was collected from 34 participants during the viewing of the dataset, using Free-Viewing and Visual Search task instructions. Results showed that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. Temporality of fixations, 4. task difficulty and 5. center bias. From such dataset (SID4VAM), we have computed a benchmark of saliency models by testing performance using psychophysical patterns. Model performance has been evaluated considering model inspiration and consistency with human psychophysics. Our study reveals that state-of-the-art Deep Learning saliency models do not performwell with synthetic pattern images, instead, modelswith Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. (2) Computations in the primary visual cortex (area V1 or striate cortex) have long been hypothesized to be responsible, among several visual processing mechanisms, of bottom-up visual attention (also named saliency). In order to validate this hypothesis, images from eye tracking datasets have been processed with a biologically plausible model of V1 (named Neurodynamic SaliencyWaveletModel or NSWAM). Following Li’s neurodynamic model, we define V1’s lateral connections with a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation and scale. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. The resulting saliency maps are generated from the model output, representing the neuronal activity of V1 projections towards brain areas involved in eye movement control. We want to pinpoint that our unified computational architecture is able to reproduce several visual processes (i.e. brightness, chromatic induction and visual discomfort) without applying any type of training or optimization and keeping the same parametrization. The model has been extended (NSWAM-CM) with an implementation of the cortical magnification function to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outperforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences, specifically for nature and synthetic images. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts. (3) Although previous scanpath models have been able to efficiently predict saccades during Free-Viewing, it is well known that stimulus and task instructions can strongly affect eye movement patterns. In particular, task priming has been shown to be crucial to the deployment of eye movements, involving interactions between brain areas related to goal-directed behavior, working and long-termmemory in combination with stimulus-driven eyemovement neuronal correlates. In our latest study we proposed an extension of the Selective Tuning Attentive Reference Fixation ControllerModel based on task demands (STAR-FCT), describing novel computational definitions of Long-TermMemory, Visual Task Executive and Task Working Memory. With these modules we are able to use textual instructions in order to guide the model to attend to specific categories of objects and/or places in the scene. We have designed our memorymodel by processing a visual hierarchy of low- and high-level features. The relationship between the executive task instructions and the memory representations has been specified using a tree of semantic similarities between the learned features and the object category labels. Results reveal that by using this model, the resulting object localizationmaps and predicted saccades have a higher probability to fall inside the salient regions depending on the distinct task instructions compared to saliency.
	Address	July 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Xavier Otazu
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-8-0	Medium
	Area		Expedition		Conference
	Notes	NEUROBIT			Approved	no
	Call Number	Admin @ si @ Ber2019			Serial	3390
Permanent link to this record



	Author	Xavier Soria
	Title	Single sensor multi-spectral imaging			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The image sensor, nowadays, is rolling the smartphone industry. While some phone brands explore equipping more image sensors, others, like Google, maintain their smartphones with just one sensor; but this sensor is equipped with Deep Learning to enhance the image quality. However, what all brands agree on is the need to research new image sensors; for instance, in 2015 Omnivision and PixelTeq presented new CMOS based image sensors defined as multispectral Single Sensor Camera (SSC), which are capable of capturing multispectral bands. This dissertation presents the benefits of using a multispectral SSCs that, as aforementioned, simultaneously acquires images in the visible and near-infrared (NIR) bands. The principal benefits while addressing problems related to image bands in the spectral range of 400 to 1100 nanometers, there are cost reductions in the hardware and software setup because only one SSC is needed instead of two, and the images alignment are not required any more. Concerning to the NIR spectrum, many works in literature have proven the benefits of working with NIR to enhance RGB images (e.g., image enhancement, remove shadows, dehazing, etc.). In spite of the advantage of using SSC (e.g., low latency), there are some drawback to be solved. One of this drawback corresponds to the nature of the silicon-based sensor, which in addition to capture the RGB image, when the infrared cut off filter is not installed it also acquires NIR information into the visible image. This phenomenon is called RGB and NIR crosstalking. This thesis firstly faces this problem in challenging images and then it shows the benefit of using multispectral images in the edge detection task. The RGB color restoration from RGBN image is the topic tackled in RGB and NIR crosstalking. Even though in the literature a set of processes have been proposed to face this issue, in this thesis novel approaches, based on DL, are proposed to subtract the additional NIR included in the RGB channel. More precisely, an Artificial Neural Network (NN) and two Convolutional Neural Network (CNN) models are proposed. As the DL based models need a dataset with a large collection of image pairs, a large dataset is collected to address the color restoration. The collected images are from challenging scenes where the sunlight radiation is sufficient to give absorption/reflectance properties to the considered scenes. An extensive evaluation has been conducted on the CNN models, differences from most of the restored images are almost imperceptible to the human eye. The next proposal of the thesis is the validation of the usage of SSC images in the edge detection task. Three methods based on CNN have been proposed. While the first one is based on the most used model, holistically-nested edge detection (HED) termed as multispectral HED (MS-HED), the other two have been proposed observing the drawbacks of MS-HED. These two novel architectures have been designed from scratch (training from scratch); after the first architecture is validated in the visible domain a slight redesign is proposed to tackle the multispectral domain. Again, another dataset is collected to face this problem with SSCs. Even though edge detection is confronted in the multispectral domain, its qualitative and quantitative evaluation demonstrates the generalization in other datasets used for edge detection, improving state-of-the-art results.
	Address	September 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Angel Sappa
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-9-7	Medium
	Area		Expedition		Conference
	Notes	MSIAU; 600.122			Approved	no
	Call Number	Admin @ si @ Sor2019			Serial	3391
Permanent link to this record



	Author	Antonio Esteban Lansaque
	Title	An Endoscopic Navigation System for Lung Cancer Biopsy			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Lung cancer is one of the most diagnosed cancers among men and women. Actually, lung cancer accounts for 13% of the total cases with a 5-year global survival rate in patients. Although Early detection increases survival rate from 38% to 67%, accurate diagnosis remains a challenge. Pathological confirmation requires extracting a sample of the lesion tissue for its biopsy. The preferred procedure for tissue biopsy is called bronchoscopy. A bronchoscopy is an endoscopic technique for the internal exploration of airways which facilitates the performance of minimal invasive interventions with low risk for the patient. Recent advances in bronchoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures, like lung cancer biopsy sampling. Despite the improvement in bronchoscopic device quality, there is a lack of intelligent computational systems for supporting in-vivo clinical decision during examinations. Existing technologies fail to accurately reach the lesion due to several aspects at intervention off-line planning and poor intra-operative guidance at exploration time. Existing guiding systems radiate patients and clinical staff,might be expensive and achieve a suboptimlal 70% of yield boost. Diagnostic yield could be improved reducing radiation and costs by developing intra-operative support systems able to guide the bronchoscopist to the lesion during the intervention. The goal of this PhD thesis is to develop an image-based navigation systemfor intra-operative guidance of bronchoscopists to a target lesion across a path previously planned on a CT-scan. We propose a 3D navigation system which uses the anatomy of video bronchoscopy frames to locate the bronchoscope within the airways. Once the bronchoscope is located, our navigation system is able to indicate the bifurcation which needs to be followed to reach the lesion. In order to facilitate an off-line validation as realistic as possible, we also present a method for augmenting simulated virtual bronchoscopies with the appearance of intra-operative videos. Experiments performed on augmented and intra-operative videos, prove that our algorithm can be speeded up for an on-line implementation in the operating room.
	Address	October 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Debora Gil;Carles Sanchez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-0-2	Medium
	Area		Expedition		Conference
	Notes	IAM; 600.139; 600.145			Approved	no
	Call Number	Admin @ si @ Est2019			Serial	3392
Permanent link to this record



	Author	Lichao Zhang
	Title	Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
	Address	November 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-1210011-1-9	Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.141; 600.120			Approved	no
	Call Number	Admin @ si @ Zha2019			Serial	3393
Permanent link to this record



	Author	Lu Yu
	Title	Semantic Representation: From Color to Deep Embeddings			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings. In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists. In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
	Address	November 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Yongmei Cheng
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-3-3	Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ Yu2019			Serial	3394
Permanent link to this record



	Author	Albert Berenguel
	Title	Analysis of background textures in banknotes and identity documents for counterfeit detection			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
	Address	November 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Oriol Ramos Terrades;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-2-6	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ Ber2019			Serial	3395
Permanent link to this record



	Author	Xialei Liu
	Title	Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains			Type	Book Whole
	Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks. Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting. In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
	Address	December 2019
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Andrew Bagdanov
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-4-0	Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ Liu2019			Serial	3396
Permanent link to this record

Select All Deselect All

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format:

Home

SQL Search | Library Search | Show Record | Extract Citations

Help