Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–14]

Details

Records
Author	David Fernandez
Title	Contextual Word Spotting in Historical Handwritten Documents			Type	Book Whole
Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners. There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent deciencies: poor physical preservation, dierent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Alicia Fornes
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-940902-7-1	Medium
Area		Expedition		Conference
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ Fer2014			Serial	2573
Permanent link to this record



Author	David Geronimo
Title	A Global Approach to Vision-Based Pedestrian Detection for Advanced Driver Assistance Systems			Type	Book Whole
Year	2010	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	At the beginning of the 21th century, traffic accidents have become a major problem not only for developed countries but also for emerging ones. As in other scientific areas in which Artificial Intelligence is becoming a key actor, advanced driver assistance systems, and concretely pedestrian protection systems based on Computer Vision, are becoming a strong topic of research aimed at improving the safety of pedestrians. However, the challenge is of considerable complexity due to the varying appearance of humans (e.g., clothes, size, aspect ratio, shape, etc.), the dynamic nature of on-board systems and the unstructured moving environments that urban scenarios represent. In addition, the required performance is demanding both in terms of computational time and detection rates. In this thesis, instead of focusing on improving specific tasks as it is frequent in the literature, we present a global approach to the problem. Such a global overview starts by the proposal of a generic architecture to be used as a framework both to review the literature and to organize the studied techniques along the thesis. We then focus the research on tasks such as foreground segmentation, object classification and refinement following a general viewpoint and exploring aspects that are not usually analyzed. In order to perform the experiments, we also present a novel pedestrian dataset that consists of three subsets, each one addressed to the evaluation of a different specific task in the system. The results presented in this thesis not only end with a proposal of a pedestrian detection system but also go one step beyond by pointing out new insights, formalizing existing and proposed algorithms, introducing new techniques and evaluating their performance, which we hope will provide new foundations for future research in the area.
Address	Antonio Lopez;Krystian Mikolajczyk;Jaume Amores;Dariu M. Gavrila;Oriol Pujol;Felipe Lumbreras
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Antonio Lopez;Krystian Mikolajczyk;Jaume Amores;Dariu M. Gavrila;Oriol Pujol;Felipe Lumbreras
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-936529-5-1	Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	ADAS @ adas @ Ger2010			Serial	1279
Permanent link to this record



Author	David Geronimo; Antonio Lopez
Title	Vision-based Pedestrian Protection Systems for Intelligent Vehicles			Type	Book Whole
Year	2014	Publication	SpringerBriefs in Computer Science	Abbreviated Journal
Volume		Issue		Pages	1-114
Keywords	Computer Vision; Driver Assistance Systems; Intelligent Vehicles; Pedestrian Detection; Vulnerable Road Users
Abstract	Pedestrian Protection Systems (PPSs) are on-board systems aimed at detecting and tracking people in the surroundings of a vehicle in order to avoid potentially dangerous situations. These systems, together with other Advanced Driver Assistance Systems (ADAS) such as lane departure warning or adaptive cruise control, are one of the most promising ways to improve traffic safety. By the use of computer vision, cameras working either in the visible or infra-red spectra have been demonstrated as a reliable sensor to perform this task. Nevertheless, the variability of human’s appearance, not only in terms of clothing and sizes but also as a result of their dynamic shape, makes pedestrians one of the most complex classes even for computer vision. Moreover, the unstructured changing and unpredictable environment in which such on-board systems must work makes detection a difficult task to be carried out with the demanded robustness. In this brief, the state of the art in PPSs is introduced through the review of the most relevant papers of the last decade. A common computational architecture is presented as a framework to organize each method according to its main contribution. More than 300 papers are referenced, most of them addressing pedestrian detection and others corresponding to the descriptors (features), pedestrian models, and learning machines used. In addition, an overview of topics such as real-time aspects, systems benchmarking and future challenges of this research area are presented.
Address
Corporate Author				Thesis
Publisher	Springer Briefs in Computer Vision	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-4614-7986-4	Medium
Area		Expedition		Conference
Notes	ADAS; 600.076			Approved	no
Call Number	GeL2014			Serial	2325
Permanent link to this record



Author	David Guillamet
Title	Statistical Local Appearance Models for Object Recognition			Type	Book Whole
Year	2004	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	Bellaterra
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Jordi Vitria
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Gui2004			Serial	444
Permanent link to this record



Author	David Lloret
Title	Medical Image Registration Based on a Creaseress Measure.			Type	Book Whole
Year	2002	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Joan Serrat
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Llo2002			Serial	321
Permanent link to this record



Author	David Masip
Title	Face Classification Using Discriminative Features and Classifier Combination			Type	Book Whole
Year	2005	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address	CVC (UAB)
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Jordi Vitria
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	84-933652-3-8	Medium
Area		Expedition		Conference
Notes	OR;MV			Approved	no
Call Number	Admin @ si @ Mas2005b			Serial	602
Permanent link to this record



Author	David Roche
Title	A Statistical Framework for Terminating Evolutionary Algorithms at their Steady State			Type	Book Whole
Year	2015	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	As any iterative technique, it is a necessary condition a stop criterion for terminating Evolutionary Algorithms (EA). In the case of optimization methods, the algorithm should stop at the time it has reached a steady state so it can not improve results anymore. Assessing the reliability of termination conditions for EAs is of prime importance. A wrong or weak stop criterion can negatively aect both the computational eort and the nal result. In this Thesis, we introduce a statistical framework for assessing whether a termination condition is able to stop EA at its steady state. In one hand a numeric approximation to steady states to detect the point in which EA population has lost its diversity has been presented for EA termination. This approximation has been applied to dierent EA paradigms based on diversity and a selection of functions covering the properties most relevant for EA convergence. Experiments show that our condition works regardless of the search space dimension and function landscape and Dierential Evolution (DE) arises as the best paradigm. On the other hand, we use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in xspace. Our theoretical framework is analyzed across several benchmark test functions and two standard termination criteria based on function improvement in f-space and EA population x-space distribution for the DE paradigm. Results validate our statistical framework as a powerful tool for determining the capability of a measure for terminating EA and select the x-space distribution as the best-suited for accurately stopping DE in real-world applications.
Address	July 2015
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Debora Gil;Jesus Giraldo
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	IAM; 600.075			Approved	no
Call Number	Admin @ si @ Roc2015			Serial	2686
Permanent link to this record



Author	David Rotger
Title	Analysis and Multi-Modal Fusion of coronary Images			Type	Book Whole
Year	2009	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The framework of this thesis is to study in detail different techniques and tools for medical image registration in order to ease the daily life of clinical experts in cardiology. The first aim of this thesis is providing computer tools for fusing IVUS and angiogram data is of high clinical interest to help the physicians locate in IVUS data and decide which lesion is observed, how long it is, how far from a bifurcation or another lesions stays, etc. This thesis proves and validates that we can segment the catheter path in angiographies using geodesic snakes (based on fast marching algorithm), a three-dimensional reconstruction of the catheter inspired in stereo vision and a new technique to fuse IVUS and angiograms that establishes exact correspondences between them. We have developed a new workstation called iFusion that has four strong advantages: registration of IVUS and angiographic images with sub-pixel precision, it works on- and off-line, it is independent on the X-ray system and there is no need of daily calibration. The second aim of the thesis is devoted to developing a computer-aided analysis of IVUS for image-guided intervention. We have designed, implemented and validated a robust algorithm for stent extraction and reconstruction from IVUS videos. We consider a very special and recent kind of stents, bioabsorbable stents that represent a great clinical challenge due to their property to be absorbed by time and thus avoiding the “danger” of neostenosis as one of the main problems of metallic stents. We present a new and very promising algorithm based on an optimized cascade of multiple classifiers to automatically detect individual stent struts of a very novel bioabsorbable drug eluting coronary stent. This problem represents a very challenging target given the variability in contrast, shape and grey levels of the regions to be detected, what is denoted by the high variability between the specialists (inter-observer variability of 0.14~$\pm$0.12). The obtained results of the automatic strut detection are within the inter-observer variability.
Address	Barcelona (Espanya)
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Petia Radeva
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Rot2009			Serial	1261
Permanent link to this record



Author	David Vazquez
Title	Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection			Type	Book Whole
Year	2013	Publication	PhD Thesis, Universitat de Barcelona-CVC	Abbreviated Journal
Volume	1	Issue	1	Pages	1-105
Keywords	Pedestrian Detection; Domain Adaptation
Abstract	Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.
Address	Barcelona
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication	Barcelona	Editor	Antonio Lopez;Daniel Ponsa
Language	English	Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-940530-1-6	Medium
Area		Expedition		Conference
Notes	adas			Approved	yes
Call Number	ADAS @ adas @ Vaz2013			Serial	2276
Permanent link to this record



Author	Debora Gil
Title	Geometric Differential Operators for Shape Modelling			Type	Book Whole
Year	2004	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Medical imaging feeds research in many computer vision and image processing fields: image filtering, segmentation, shape recovery, registration, retrieval and pattern matching. Because of their low contrast changes and large variety of artifacts and noise, medical imaging processing techniques relying on an analysis of the geometry of image level sets rather than on intensity values result in more robust treatment. From the starting point of treatment of intravascular images, this PhD thesis ad- dresses the design of differential image operators based on geometric principles for a robust shape modelling and restoration. Among all fields applying shape recovery, we approach filtering and segmentation of image objects. For a successful use in real images, the segmentation process should go through three stages: noise removing, shape modelling and shape recovery. This PhD addresses all three topics, but for the sake of algorithms as automated as possible, techniques for image processing will be designed to satisfy three main principles: a) convergence of the iterative schemes to non-trivial states avoiding image degeneration to a constant image and representing smooth models of the originals; b) smooth asymptotic behav- ior ensuring stabilization of the iterative process; c) fixed parameter values ensuring equal (domain free) performance of the algorithms whatever initial images/shapes. Our geometric approach to the generic equations that model the different processes approached enables defining techniques satisfying all the former requirements. First, we introduce a new curvature-based geometric flow for image filtering achieving a good compromise between noise removing and resemblance to original images. Sec- ond, we describe a new family of diffusion operators that restrict their scope to image level curves and serve to restore smooth closed models from unconnected sets of points. Finally, we design a regularization of snake (distance) maps that ensures its smooth convergence towards any closed shape. Experiments show that performance of the techniques proposed overpasses that of state-of-the-art algorithms.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication	Barcelona (Spain)	Editor	Jordi Saludes i Closa;Petia Radeva
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	84-933652-0-3	Medium	prit
Area		Expedition		Conference
Notes	IAM;			Approved	no
Call Number	IAM @ iam @ GIL2004			Serial	1517
Permanent link to this record



Author	Debora Gil; Jordi Gonzalez; Gemma Sanchez (eds)
Title	Computer Vision: Advances in Research and Development			Type	Book Whole
Year	2007	Publication	Proceedings of the 2nd CVC International Workshop	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher	UAB	Place of Publication	Bellaterra (Spain)	Editor	Debora Gil; Jordi Gonzalez; Gemma Sanchez
Language		Summary Language		Original Title
Series Editor		Series Title	2	Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-935251-4-9	Medium
Area		Expedition		Conference
Notes	IAM; ISE; DAG			Approved	no
Call Number	IAM @ iam @ GGS2007			Serial	1493
Permanent link to this record



Author	Dena Bazazian
Title	Fully Convolutional Networks for Text Understanding in Scene Images			Type	Book Whole
Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task. The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging.
Address	November 2018
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Dimosthenis Karatzas;Andrew Bagdanov
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-948531-1-1	Medium
Area		Expedition		Conference
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ Baz2018			Serial	3220
Permanent link to this record



Author	Diego Alejandro Cheda
Title	Monocular Depth Cues in Computer Vision Applications			Type	Book Whole
Year	2012	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following alternative strategies to obtain more precise and robust results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle applying two novels methods based on distant elements in the image, where the translation component of the image flow vectors is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our approaches contribute to better performances.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Daniel Ponsa;Antonio Lopez
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	Admin @ si @ Che2012			Serial	2210
Permanent link to this record



Author	Diego Velazquez
Title	Towards Robustness in Computer-based Image Understanding			Type	Book Whole
Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This thesis embarks on an exploratory journey into robustness in deep learning, with a keen focus on the intertwining facets of generalization, explainability, and edge cases within the realm of computer vision. In deep learning, robustness epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making processes. Acknowledging this, the thesis introduces a robust framework for evaluating and comparing various counterfactual explanation methods, echoing the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Jordi Gonzalez;Josep M. Gonfaus;Pau Rodriguez
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-81-126409-5-3	Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ Vel2023			Serial	3965
Permanent link to this record



Author	Edgar Riba
Title	Geometric Computer Vision Techniques for Scene Reconstruction			Type	Book Whole
Year	2021	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	From the early stages of Computer Vision, scene reconstruction has been one of the most studied topics leading to a wide variety of new discoveries and applications. Object grasping and manipulation, localization and mapping, or even visual effect generation are different examples of applications in which scene reconstruction has taken an important role for industries such as robotics, factory automation, or audio visual production. However, scene reconstruction is an extensive topic that can be approached in many different ways with already existing solutions that effectively work in controlled environments. Formally, the problem of scene reconstruction can be formulated as a sequence of independent processes which compose a pipeline. In this thesis, we analyse some parts of the reconstruction pipeline from which we contribute with novel methods using Convolutional Neural Networks (CNN) proposing innovative solutions that consider the optimisation of the methods in an end-to-end fashion. First, we review the state of the art of classical local features detectors and descriptors and contribute with two novel methods that inherently improve pre-existing solutions in the scene reconstruction pipeline. It is a fact that computer science and software engineering are two fields that usually go hand in hand and evolve according to mutual needs making easier the design of complex and efficient algorithms. For this reason, we contribute with Kornia, a library specifically designed to work with classical computer vision techniques along with deep neural networks. In essence, we created a framework that eases the design of complex pipelines for computer vision algorithms so that can be included within neural networks and be used to backpropagate gradients throw a common optimisation framework. Finally, in the last chapter of this thesis we develop the aforementioned concept of designing end-to-end systems with classical projective geometry. Thus, we contribute with a solution to the problem of synthetic view generation by hallucinating novel views from high deformable cloths objects using a geometry aware end-to-end system. To summarize, in this thesis we demonstrate that with a proper design that combine classical geometric computer vision methods with deep learning techniques can lead to improve pre-existing solutions for the problem of scene reconstruction.
Address	February 2021
Corporate Author				Thesis	Ph.D. thesis
Publisher		Place of Publication		Editor	Daniel Ponsa
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU			Approved	no
Call Number	Admin @ si @ Rib2021			Serial	3610
Permanent link to this record