|   | 
Details
   web
Record
Author (up) Lluis Gomez; Y. Patel; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas
Title Self‐supervised learning of visual features through embedding images into text topic spaces Type Conference Article
Year 2017 Publication 30th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.
Address Honolulu; Hawaii; July 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes DAG; 600.084; 600.121 Approved no
Call Number Admin @ si @ GPR2017 Serial 2889
Permanent link to this record