toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Raul Gomez edit  isbn
openurl 
  Title Exploiting the Interplay between Visual and Textual Data for Scene Interpretation Type Book Whole
  Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Machine learning experimentation under controlled scenarios and standard datasets is necessary to compare algorithms performance by evaluating all of them in the same setup. However, experimentation on how those algorithms perform on unconstrained data and applied tasks to solve real world problems is also a must to ascertain how that research can contribute to our society.
In this dissertation we experiment with the latest computer vision and natural language processing algorithms applying them to multimodal scene interpretation. Particularly, we research on how image and text understanding can be jointly exploited to address real world problems, focusing on learning from Social Media data.
We address several tasks that involve image and textual information, discuss their characteristics and offer our experimentation conclusions. First, we work on detection of scene text in images. Then, we work with Social Media posts, exploiting the captions associated to images as supervision to learn visual features, which we apply to multimodal semantic image retrieval. Subsequently, we work with geolocated Social Media images with associated tags, experimenting on how to use the tags as supervision, on location sensitive image retrieval and on exploiting location information for image tagging. Finally, we work on a specific classification problem of Social Media publications consisting on an image and a text: Multimodal hate speech classification.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Lluis Gomez;Jaume Gibert  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-7-1 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Gom20 Serial 3479  
Permanent link to this record
 

 
Author Pau Riba edit  isbn
openurl 
  Title Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images Type Book Whole
  Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in \(\mathbb{R}^n\), is not properly defined for graphs.


In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-6-4 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Rib20 Serial 3478  
Permanent link to this record
 

 
Author Yaxing Wang edit  isbn
openurl 
  Title Transferring and Learning Representations for Image Generation and Translation Type Book Whole
  Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.  
  Address January 2020  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Luis Herranz  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-5-7 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ Wan2020 Serial 3397  
Permanent link to this record
 

 
Author Xialei Liu edit  isbn
openurl 
  Title Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks.
Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting.
In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
 
  Address December 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-4-0 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ Liu2019 Serial 3396  
Permanent link to this record
 

 
Author Lu Yu edit  isbn
openurl 
  Title Semantic Representation: From Color to Deep Embeddings Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists.
In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
 
  Address November 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Yongmei Cheng  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-3-3 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ Yu2019 Serial 3394  
Permanent link to this record
 

 
Author Albert Berenguel edit  isbn
openurl 
  Title Analysis of background textures in banknotes and identity documents for counterfeit detection Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.  
  Address November 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades;Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-2-6 Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ Ber2019 Serial 3395  
Permanent link to this record
 

 
Author Antonio Esteban Lansaque edit  isbn
openurl 
  Title An Endoscopic Navigation System for Lung Cancer Biopsy Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Lung cancer is one of the most diagnosed cancers among men and women. Actually,
lung cancer accounts for 13% of the total cases with a 5-year global survival
rate in patients. Although Early detection increases survival rate from 38% to 67%, accurate diagnosis remains a challenge. Pathological confirmation requires extracting a sample of the lesion tissue for its biopsy. The preferred procedure for tissue biopsy is called bronchoscopy. A bronchoscopy is an endoscopic technique for the internal exploration of airways which facilitates the performance of minimal invasive interventions with low risk for the patient. Recent advances in bronchoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures, like lung cancer biopsy sampling. Despite the improvement in bronchoscopic device quality, there is a lack of intelligent computational systems for supporting in-vivo clinical decision during examinations. Existing technologies fail to accurately reach the lesion due to several aspects at intervention off-line planning and poor intra-operative guidance at exploration time. Existing guiding systems radiate patients and clinical staff,might be expensive and achieve a suboptimlal 70% of yield boost. Diagnostic yield could be improved reducing radiation and costs by developing intra-operative support systems able to guide the bronchoscopist to the lesion during the intervention. The goal of this PhD thesis is to develop an image-based navigation systemfor intra-operative guidance of bronchoscopists to a target lesion across a path previously planned on a CT-scan. We propose a 3D navigation system which uses the anatomy of video bronchoscopy frames to locate the bronchoscope within the airways. Once the bronchoscope is located, our navigation system is able to indicate the bifurcation which needs to be followed to reach the lesion. In order to facilitate an off-line validation
as realistic as possible, we also present a method for augmenting simulated virtual bronchoscopies with the appearance of intra-operative videos. Experiments performed on augmented and intra-operative videos, prove that our algorithm can be speeded up for an on-line implementation in the operating room.
 
  Address October 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Debora Gil;Carles Sanchez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-121011-0-2 Medium  
  Area Expedition Conference  
  Notes IAM; 600.139; 600.145 Approved no  
  Call Number Admin @ si @ Est2019 Serial 3392  
Permanent link to this record
 

 
Author Lichao Zhang edit  isbn
openurl 
  Title Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update.

To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
 
  Address November 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-84-1210011-1-9 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ Zha2019 Serial 3393  
Permanent link to this record
 

 
Author Naveen Onkarappa; Sujay M. Veerabhadrappa; Angel Sappa edit  doi
isbn  openurl
  Title Optical Flow in Onboard Applications: A Study on the Relationship Between Accuracy and Scene Texture Type Conference Article
  Year 2012 Publication 4th International Conference on Signal and Image Processing Abbreviated Journal  
  Volume 221 Issue Pages 257-267  
  Keywords  
  Abstract Optical flow has got a major role in making advanced driver assistance systems (ADAS) a reality. ADAS applications are expected to perform efficiently in all kinds of environments, those are highly probable, that one can drive the vehicle in different kinds of roads, times and seasons. In this work, we study the relationship of optical flow with different roads, that is by analyzing optical flow accuracy on different road textures. Texture measures such as TeX , TeX and TeX are evaluated for this purpose. Further, the relation of regularization weight to the flow accuracy in the presence of different textures is also analyzed. Additionally, we present a framework to generate synthetic sequences of different textures in ADAS scenarios with ground-truth optical flow.  
  Address Coimbatore, India  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1876-1100 ISBN (down) 978-81-322-0996-6 Medium  
  Area Expedition Conference ICSIP  
  Notes ADAS Approved no  
  Call Number Admin @ si @ OVS2012 Serial 2356  
Permanent link to this record
 

 
Author Monica Piñol; Angel Sappa; Ricardo Toledo edit   pdf
doi  isbn
openurl 
  Title MultiTable Reinforcement for Visual Object Recognition Type Conference Article
  Year 2012 Publication 4th International Conference on Signal and Image Processing Abbreviated Journal  
  Volume 221 Issue Pages 469-480  
  Keywords  
  Abstract This paper presents a bag of feature based method for visual object recognition. Our contribution is focussed on the selection of the best feature descriptor. It is implemented by using a novel multi-table reinforcement learning method that selects among five of classical descriptors (i.e., Spin, SIFT, SURF, C-SIFT and PHOW) the one that best describes each image. Experimental results and comparisons are provided showing the improvements achieved with the proposed approach.  
  Address Coimbatore, India  
  Corporate Author Thesis  
  Publisher Springer India Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN 1876-1100 ISBN (down) 978-81-322-0996-6 Medium  
  Area Expedition Conference ICSIP  
  Notes ADAS Approved no  
  Call Number Admin @ si @ PST2012 Serial 2157  
Permanent link to this record
 

 
Author Diego Velazquez edit  isbn
openurl 
  Title Towards Robustness in Computer-based Image Understanding Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This thesis embarks on an exploratory journey into robustness in deep learning,
with a keen focus on the intertwining facets of generalization, explainability, and
edge cases within the realm of computer vision. In deep learning, robustness
epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making
processes. Acknowledging this, the thesis introduces a robust framework for
evaluating and comparing various counterfactual explanation methods, echoing
the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the
thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Jordi Gonzalez;Josep M. Gonfaus;Pau Rodriguez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-81-126409-5-3 Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Vel2023 Serial 3965  
Permanent link to this record
 

 
Author S.Grau; Anna Puig; Sergio Escalera; Maria Salamo; Oscar Amoros edit  isbn
openurl 
  Title Efficient complementary viewpoint selection in volume rendering Type Conference Article
  Year 2013 Publication 21st WSCG Conference on Computer Graphics, Abbreviated Journal  
  Volume Issue Pages  
  Keywords Dual camera; Visualization; Interactive Interfaces; Dynamic Time Warping.  
  Abstract A major goal of visualization is to appropriately express knowledge of scientific data. Generally, gathering visual information contained in the volume data often requires a lot of expertise from the final user to setup the parameters of the visualization. One way of alleviating this problem is to provide the position of inner structures with different viewpoint locations to enhance the perception and construction of the mental image. To this end, traditional illustrations use two or three different views of the regions of interest. Similarly, with the aim of assisting the users to easily place a good viewpoint location, this paper proposes an automatic and interactive method that locates different complementary viewpoints from a reference camera in volume datasets. Specifically, the proposed method combines the quantity of information each camera provides for each structure and the shape similarity of the projections of the remaining viewpoints based on Dynamic Time Warping. The selected complementary viewpoints allow a better understanding of the focused structure in several applications. Thus, the user interactively receives feedback based on several viewpoints that helps him to understand the visual information. A live-user evaluation on different data sets show a good convergence to useful complementary viewpoints.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-808694374-9 Medium  
  Area Expedition Conference WSCG  
  Notes HuPBA; 600.046;MILAB Approved no  
  Call Number Admin @ si @ GPE2013a Serial 2255  
Permanent link to this record
 

 
Author Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Gabriel Brea-Martinez; Miquel Valls-Figols edit  url
doi  isbn
openurl 
  Title The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries Type Book Chapter
  Year 2019 Publication Nominative Data in Demographic Research in the East and the West: monograph Abbreviated Journal  
  Volume Issue Pages 29-61  
  Keywords  
  Abstract The Baix Llobregat (BALL) Demographic Database is an ongoing database project containing individual census data from the Catalan region of Baix Llobregat (Spain) during the nineteenth and twentieth centuries. The BALL Database is built within the project ‘NETWORKS: Technology and citizen innovation for building historical social networks to understand the demographic past’ directed by Alícia Fornés from the Center for Computer Vision and Joana Maria Pujadas-Mora from the Center for Demographic Studies, both at the Universitat Autònoma de Barcelona, funded by the Recercaixa program (2017–2019).
Its webpage is http://dag.cvc.uab.es/xarxes/.The aim of the project is to develop technologies facilitating massive digitalization of demographic sources, and more specifically the padrones (local censuses), in order to reconstruct historical ‘social’ networks employing computer vision technology. Such virtual networks can be created thanks to the linkage of nominative records compiled in the local censuses across time and space. Thus, digitized versions of individual and family lifespans are established, and individuals and families can be located spatially.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-5-7996-2656-3 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ PFL2019 Serial 3351  
Permanent link to this record
 

 
Author Mikhail Mozerov; Ariel Amato; Xavier Roca edit  isbn
openurl 
  Title Occlusion Handling in Trinocular Stereo using Composite Disparity Space Image Type Conference Article
  Year 2009 Publication 19th International Conference on Computer Graphics and Vision Abbreviated Journal  
  Volume Issue Pages 69–73  
  Keywords  
  Abstract In this paper we propose a method that smartly improves occlusion handling in stereo matching using trinocular stereo. The main idea is based on the assumption that any occluded region in a matched stereo pair (middle-left images) in general is not occluded in the opposite matched pair (middle-right images). Then two disparity space images (DSI) can be merged in one composite DSI. The proposed integration differs from the known approach that uses a cumulative cost. A dense disparity map is obtained with a global optimization algorithm using the proposed composite DSI. The experimental results are evaluated on the Middlebury data set, showing high performance of the proposed algorithm especially in the occluded regions. One of the top positions in the rank of the Middlebury website confirms the performance of our method to be competitive with the best stereo matching.  
  Address Moscow (Russia)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-5-317-02975-3 Medium  
  Area Expedition Conference GRAPHICON  
  Notes ISE Approved no  
  Call Number ISE @ ise @ MAR2009b Serial 1207  
Permanent link to this record
 

 
Author S.Grau; Ana Puig; Sergio Escalera; Maria Salamo edit   pdf
url  doi
isbn  openurl
  Title Intelligent Interactive Volume Classification Type Conference Article
  Year 2013 Publication Pacific Graphics Abbreviated Journal  
  Volume 32 Issue 7 Pages 23-28  
  Keywords  
  Abstract This paper defines an intelligent and interactive framework to classify multiple regions of interest from the original data on demand, without requiring any preprocessing or previous segmentation. The proposed intelligent and interactive approach is divided in three stages: visualize, training and testing. First, users visualize and label some samples directly on slices of the volume. Training and testing are based on a framework of Error Correcting Output Codes and Adaboost classifiers that learn to classify each region the user has painted. Later, at the testing stage, each classifier is directly applied on the rest of samples and combined to perform multi-class labeling, being used in the final rendering. We also parallelized the training stage using a GPU-based implementation for
obtaining a rapid interaction and classification.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (down) 978-3-905674-50-7 Medium  
  Area Expedition Conference PG  
  Notes HuPBA; 600.046;MILAB Approved no  
  Call Number Admin @ si @ GPE2013b Serial 2355  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: