Home | [171–180] << 181 182 183 184 185 186 187 188 189 190 >> [191–200] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Aitor Alvarez-Gila | ||||
Title ![]() |
Self-supervised learning for image-to-image translation in the small data regime | Type | Book Whole | ||
Year | 2022 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Computer vision; Neural networks; Self-supervised learning; Image-to-image mapping; Probabilistic programming | ||||
Abstract | The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with unprecedented performance —often, surpassing that of humans—, at the expense of vast amounts of human-labeled data, extensive computational resources and the disposal of all of our prior knowledge on the task at hand. Even though simple transfer learning methods, such as fine-tuning, have achieved remarkable impact, their success when the amount of labeled data in the target domain is small is limited. Furthermore, the non-static nature of data generation sources will often derive in data distribution shifts that degrade the performance of deployed models. As a consequence, there is a growing demand for methods that can exploit elements of prior knowledge and sources of information other than the manually generated ground truth annotations of the images during the network training process, so that they can adapt to new domains that constitute, if not a small data regime, at least a small labeled data regime. This thesis targets such few or no labeled data scenario in three distinct image-to-image mapping learning problems. It contributes with various approaches that leverage our previous knowledge of different elements of the image formation process: We first present a data-efficient framework for both defocus and motion blur detection, based on a model able to produce realistic synthetic local degradations. The framework comprises a self-supervised, a weakly-supervised and a semi-supervised instantiation, depending on the absence or availability and the nature of human annotations, and outperforms fully-supervised counterparts in a variety of settings. Our knowledge on color image formation is then used to gather input and target ground truth image pairs for the RGB to hyperspectral image reconstruction task. We make use of a CNN to tackle this problem, which, for the first time, allows us to exploit spatial context and achieve state-of-the-art results given a limited hyperspectral image set. In our last contribution to the subfield of data-efficient image-to-image transformation problems, we present the novel semi-supervised task of zero-pair cross-view semantic segmentation: we consider the case of relocation of the camera in an end-to-end trained and deployed monocular, fixed-view semantic segmentation system often found in industry. Under the assumption that we are allowed to obtain an additional set of synchronized but unlabeled image pairs of new scenes from both original and new camera poses, we present ZPCVNet, a model and training procedure that enables the production of dense semantic predictions in either source or target views at inference time. The lack of existing suitable public datasets to develop this approach led us to the creation of MVMO, a large-scale Multi-View, Multi-Object path-traced dataset with per-view semantic segmentation annotations. We expect MVMO to propel future research in the exciting under-developed fields of cross-view and multi-view semantic segmentation. Last, in a piece of applied research of direct application in the context of process monitoring of an Electric Arc Furnace (EAF) in a steelmaking plant, we also consider the problem of simultaneously estimating the temperature and spectral emissivity of distant hot emissive samples. To that end, we design our own capturing device, which integrates three point spectrometers covering a wide range of the Ultra-Violet, visible, and Infra-Red spectra and is capable of registering the radiance signal incoming from an 8cm diameter spot located up to 20m away. We then define a physically accurate radiative transfer model that comprises the effects of atmospheric absorbance, of the optical system transfer function, and of the sample temperature and spectral emissivity themselves. We solve this inverse problem without the need for annotated data using a probabilistic programming-based Bayesian approach, which yields full posterior distribution estimates of the involved variables that are consistent with laboratory-grade measurements. | ||||
Address | Julu, 2019 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Place of Publication | Editor | Joost Van de Weijer; Estibaliz Garrote | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ Alv2022 | Serial | 3716 | ||
Permanent link to this record | |||||
Author | Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas | ||||
Title ![]() |
Self-Supervised Learning from Web Data for Multimodal Retrieval | Type | Book Chapter | ||
Year | 2019 | Publication | Multi-Modal Scene Understanding Book | Abbreviated Journal | |
Volume | Issue | Pages | 279-306 | ||
Keywords | self-supervised learning; webly supervised learning; text embeddings; multimodal retrieval; multimodal embedding | ||||
Abstract | Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated text without supervision and analyze the semantic structure of the learnt joint image and text embeddingspace. Weperformathoroughanalysisandperformancecomparisonoffivedifferentstateof the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text basedimageretrievaltask,andweclearlyoutperformstateoftheartintheMIRFlickrdatasetwhen training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.129; 601.338; 601.310 | Approved | no | ||
Call Number | Admin @ si @ GGG2019 | Serial | 3266 | ||
Permanent link to this record | |||||
Author | Y. Patel; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar | ||||
Title ![]() |
Self-Supervised Visual Representations for Cross-Modal Retrieval | Type | Conference Article | ||
Year | 2019 | Publication | ACM International Conference on Multimedia Retrieval | Abbreviated Journal | |
Volume | Issue | Pages | 182–186 | ||
Keywords | |||||
Abstract | Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset. | ||||
Address | Otawa; Canada; june 2019 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICMR | ||
Notes | DAG; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ PGR2019 | Serial | 3288 | ||
Permanent link to this record | |||||
Author | Lu Yu; Xialei Liu; Joost Van de Weijer | ||||
Title ![]() |
Self-Training for Class-Incremental Semantic Segmentation | Type | Journal Article | ||
Year | 2022 | Publication | IEEE Transactions on Neural Networks and Learning Systems | Abbreviated Journal | TNNLS |
Volume | Issue | Pages | |||
Keywords | Class-incremental learning; Self-training; Semantic segmentation. | ||||
Abstract | In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then, pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. In addition, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments, we show that the auxiliary data can be different from the training data and that even general-purpose, but diverse auxiliary data can lead to large performance gains. The experiments demonstrate the state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.147; 611.008; | Approved | no | ||
Call Number | Admin @ si @ YLW2022 | Serial | 3745 | ||
Permanent link to this record | |||||
Author | Xose M. Pardo; Petia Radeva; Juan J. Villanueva | ||||
Title ![]() |
Self-Training Statistic Snake for Image Segmentation and Tracking. | Type | Miscellaneous | ||
Year | 1999 | Publication | Abbreviated Journal | ||
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Venice | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | BCNPCL @ bcnpcl @ PRV1999 | Serial | 26 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Y. Patel; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas | ||||
Title ![]() |
Self‐supervised learning of visual features through embedding images into text topic spaces | Type | Conference Article | ||
Year | 2017 | Publication | 30th IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches. | ||||
Address | Honolulu; Hawaii; July 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | DAG; 600.084; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GPR2017 | Serial | 2889 | ||
Permanent link to this record | |||||
Author | Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep Llados; Saumik Bhattacharya; Umapada Pal | ||||
Title ![]() |
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Doccument Analysis and Recognition | Abbreviated Journal | |
Volume | 14187 | Issue | Pages | 342–360 | |
Keywords | |||||
Abstract | Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: this https URL | ||||
Address | Document Layout Analysis; Document | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ MBM2023 | Serial | 3990 | ||
Permanent link to this record | |||||
Author | Carles Fernandez; Pau Baiget; Xavier Roca; Jordi Gonzalez | ||||
Title ![]() |
Semantic Annotation of Complex Human Scenes for Multimedia Surveillance | Type | Conference Article | ||
Year | 2007 | Publication | AI* Artificial Intelligence and Human–Oriented Computing. 10th Congress of the Italian Association for Artificial Intelligence, | Abbreviated Journal | |
Volume | 4733 | Issue | Pages | 698–709 | |
Keywords | |||||
Abstract | |||||
Address | Roma (Italy) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | AI | ||
Notes | ISE | Approved | no | ||
Call Number | ISE @ ise @ FBR2007a | Serial | 920 | ||
Permanent link to this record | |||||
Author | Lu Yu; Bartlomiej Twardowski; Xialei Liu; Luis Herranz; Kai Wang; Yongmai Cheng; Shangling Jui; Joost Van de Weijer | ||||
Title ![]() |
Semantic Drift Compensation for Class-Incremental Learning of Embeddings | Type | Conference Article | ||
Year | 2020 | Publication | 33rd IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results. | ||||
Address | Virtual CVPR | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.141; 601.309; 602.200; 600.120 | Approved | no | ||
Call Number | Admin @ si @ YTL2020 | Serial | 3422 | ||
Permanent link to this record | |||||
Author | Akhil Gurram; Onay Urfalioglu; Ibrahim Halfaoui; Fahd Bouzaraa; Antonio Lopez | ||||
Title ![]() |
Semantic Monocular Depth Estimation Based on Artificial Intelligence | Type | Journal Article | ||
Year | 2020 | Publication | IEEE Intelligent Transportation Systems Magazine | Abbreviated Journal | ITSM |
Volume | 13 | Issue | 4 | Pages | 99-103 |
Keywords | |||||
Abstract | Depth estimation provides essential information to perform autonomous driving and driver assistance. A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels where the same raw training data is associated with both types of ground truth, i.e., depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, i.e., that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on monocular depth estimation. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.124; 600.118 | Approved | no | ||
Call Number | Admin @ si @ GUH2019 | Serial | 3306 | ||
Permanent link to this record | |||||
Author | Mingyi Yang; Luis Herranz; Fei Yang; Luka Murn; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang; Marta Mrak | ||||
Title ![]() |
Semantic Preprocessor for Image Compression for Machines | Type | Conference Article | ||
Year | 2023 | Publication | IEEE International Conference on Acoustics, Speech and Signal Processing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Visual content is being increasingly transmitted and consumed by machines rather than humans to perform automated content analysis tasks. In this paper, we propose an image preprocessor that optimizes the input image for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. To achieve a better trade-off between the accuracy of the machine analysis task and bitrate, we propose leveraging pre-extracted semantic information to improve the preprocessor’s ability to accurately identify and filter out task-irrelevant information. Furthermore, we propose a two-part loss function to optimize the preprocessor, consisted of a rate-task performance loss and a semantic distillation loss, which helps the reconstructed image obtain more information that contributes to the accuracy of the task. Experiments show that the proposed preprocessor can save up to 48.83% bitrate compared with the method without the preprocessor, and save up to 36.24% bitrate compared to existing preprocessors for machine vision. | ||||
Address | Rodhes Islands; Greece; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICASSP | ||
Notes | MACO; LAMP | Approved | no | ||
Call Number | Admin @ si @ YHY2023 | Serial | 3912 | ||
Permanent link to this record | |||||
Author | Fahad Shahbaz Khan; Joost Van de Weijer; Muhammad Anwer Rao; Michael Felsberg; Carlo Gatta | ||||
Title ![]() |
Semantic Pyramids for Gender and Action Recognition | Type | Journal Article | ||
Year | 2014 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 23 | Issue | 8 | Pages | 3633-3645 |
Keywords | |||||
Abstract | Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1057-7149 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC; LAMP; 601.160; 600.074; 600.079;MILAB | Approved | no | ||
Call Number | Admin @ si @ KWR2014 | Serial | 2507 | ||
Permanent link to this record | |||||
Author | Lu Yu | ||||
Title ![]() |
Semantic Representation: From Color to Deep Embeddings | Type | Book Whole | ||
Year | 2019 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists. In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks. |
||||
Address | November 2019 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Joost Van de Weijer;Yongmei Cheng | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-121011-3-3 | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ Yu2019 | Serial | 3394 | ||
Permanent link to this record | |||||
Author | J.M. Sanchez | ||||
Title ![]() |
Semantic retrieval from digital video libraries in the TV commercials domain | Type | Report | ||
Year | 1999 | Publication | CVC Technical Report #29 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ San1999 | Serial | 191 | ||
Permanent link to this record | |||||
Author | Oriol Martinez | ||||
Title ![]() |
Semantic Retrieval of Memory Color Content | Type | Report | ||
Year | 2004 | Publication | CVC Technical Report #80 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | CVC (UAB) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ 38047 | Serial | 508 | ||
Permanent link to this record |