Publicacions CVC -- Query Results

[51–60] << 61 62 63 64 65 66 67 68 69 70 >> [71–80]

Details

Records
Author	W.Win; B.Bao; Q.Xu; Luis Herranz; Shuqiang Jiang
Title	Editorial Note: Efficient Multimedia Processing Methods and Applications			Type	Miscellaneous
Year	2019	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
Volume	78	Issue	1	Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ WBX2019			Serial	3257
Permanent link to this record



Author	Lichao Zhang
Title	Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos			Type	Book Whole
Year	2019	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
Address	November 2019
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-1210011-1-9	Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ Zha2019			Serial	3393
Permanent link to this record



Author	Yaxing Wang
Title	Transferring and Learning Representations for Image Generation and Translation			Type	Book Whole
Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.
Address	January 2020
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Abel Gonzalez;Luis Herranz
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-121011-5-7	Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ Wan2020			Serial	3397
Permanent link to this record



Author	Xiangyang Li; Luis Herranz; Shuqiang Jiang
Title	Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition			Type	Journal
Year	2020	Publication	ACM Transactions on Data Science	Abbreviated Journal	ACM
Volume		Issue		Pages
Keywords
Abstract	In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets can not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used to a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available is usually limited or imbalanced. Fine-tuning (FT) is an effective way to transfer knowledge learned in a source dataset to a target task. In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets) and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuitions about how to implement fine-tuning for computer vision tasks.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ LHJ2020			Serial	3423
Permanent link to this record



Author	Xinhang Song; Haitao Zeng; Sixian Zhang; Luis Herranz; Shuqiang Jiang
Title	Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition			Type	Conference Article
Year	2020	Publication	28th ACM International Conference on Multimedia	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Recognizing visual categories from semantic descriptions is a promising way to extend the capability of a visual classifier beyond the concepts represented in the training data (i.e. seen categories). This problem is addressed by (generalized) zero-shot learning methods (GZSL), which leverage semantic descriptions that connect them to seen categories (e.g. label embedding, attributes). Conventional GZSL are designed mostly for object recognition. In this paper we focus on zero-shot scene recognition, a more challenging setting with hundreds of categories where their differences can be subtle and often localized in certain objects or regions. Conventional GZSL representations are not rich enough to capture these local discriminative differences. Addressing these limitations, we propose a feature generation framework with two novel components: 1) multiple sources of semantic information (i.e. attributes, word embeddings and descriptions), 2) region descriptions that can enhance scene discrimination. To generate synthetic visual features we propose a two-step generative approach, where local descriptions are sampled and used as conditions to generate visual features. The generated features are then aggregated and used together with real features to train a joint classifier. In order to evaluate the proposed method, we introduce a new dataset for zero-shot scene recognition with multi-semantic annotations. Experimental results on the proposed dataset and SUN Attribute dataset illustrate the effectiveness of the proposed method.
Address	Virtual; October 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ACM
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ SZZ2020			Serial	3465
Permanent link to this record



Author	Kai Wang; Luis Herranz; Anjan Dutta; Joost Van de Weijer
Title	Bookworm continual learning: beyond zero-shot learning and continual learning			Type	Conference Article
Year	2020	Publication	Workshop TASK-CV 2020	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We propose bookworm continual learning(BCL), a flexible setting where unseen classes can be inferred via a semantic model, and the visual model can be updated continually. Thus BCL generalizes both continual learning (CL) and zero-shot learning (ZSL). We also propose the bidirectional imagination (BImag) framework to address BCL where features of both past and future classes are generated. We observe that conditioning the feature generator on attributes can actually harm the continual learning ability, and propose two variants (joint class-attribute conditioning and asymmetric generation) to alleviate this problem.
Address	Virtual; August 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECCVW
Notes	LAMP; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ WHD2020			Serial	3466
Permanent link to this record



Author	Yaxing Wang; Abel Gonzalez-Garcia; Luis Herranz; Joost Van de Weijer
Title	Controlling biases and diversity in diverse image-to-image translation			Type	Journal Article
Year	2021	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	202	Issue		Pages	103082
Keywords
Abstract	JCR 2019 Q2, IF=3.121 The task of unpaired image-to-image translation is highly challenging due to the lack of explicit cross-domain pairs of instances. We consider here diverse image translation (DIT), an even more challenging setting in which an image can have multiple plausible translations. This is normally achieved by explicitly disentangling content and style in the latent representation and sampling different styles codes while maintaining the image content. Despite the success of current DIT models, they are prone to suffer from bias. In this paper, we study the problem of bias in image-to-image translation. Biased datasets may add undesired changes (e.g. change gender or race in face images) to the output translations as a consequence of the particular underlying visual distribution in the target domain. In order to alleviate the effects of this problem we propose the use of semantic constraints that enforce the preservation of desired image properties. Our proposed model is a step towards unbiased diverse image-to-image translation (UDIT), and results in less unwanted changes in the translated images while still performing the wanted transformation. Experiments on several heavily biased datasets show the effectiveness of the proposed techniques in different domains such as faces, objects, and scenes.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.141; 600.109; 600.147			Approved	no
Call Number	Admin @ si @ WGH2021			Serial	3464
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui
Title	Generalized Source-free Domain Adaptation			Type	Conference Article
Year	2021	Publication	19th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	8958-8967
Keywords
Abstract	Domain adaptation (DA) aims to transfer the knowledge learned from a source domain to an unlabeled target domain. Some recent works tackle source-free domain adaptation (SFDA) where only a source pre-trained model is available for adaptation to the target domain. However, those methods do not consider keeping source performance which is of high practical value in real world applications. In this paper, we propose a new domain adaptation paradigm called Generalized Source-free Domain Adaptation (G-SFDA), where the learned model needs to perform well on both the target and source domains, with only access to current unlabeled target data during adaptation. First, we propose local structure clustering (LSC), aiming to cluster the target features with its semantically similar neighbors, which successfully adapts the model to the target domain in the absence of source data. Second, we propose sparse domain attention (SDA), it produces a binary domain specific attention to activate different feature channels for different domains, meanwhile the domain attention will be utilized to regularize the gradient during adaptation to keep source information. In the experiments, for target performance our method is on par with or better than existing DA and SFDA methods, specifically it achieves state-of-the-art performance (85.4%) on VisDA, and our method works well for all domains after adapting to single or multiple target domains.
Address	Virtual; October 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120; 600.147			Approved	no
Call Number	Admin @ si @ YWW2021			Serial	3605
Permanent link to this record



Author	Sudeep Katakol; Luis Herranz; Fei Yang; Marta Mrak
Title	DANICE: Domain adaptation without forgetting in neural image compression			Type	Conference Article
Year	2021	Publication	Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
Volume		Issue		Pages	1921-1925
Keywords
Abstract	Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study the adaptability of codecs to custom domains of interest. We show that NIC codecs are transferable and that they can be adapted with relatively few target domain images. However, naive adaptation interferes with the solution optimized for the original source domain, resulting in forgetting the original coding capabilities in that domain, and may even break the compatibility with previously encoded bitstreams. Addressing these problems, we propose Codec Adaptation without Forgetting (CAwF), a framework that can avoid these problems by adding a small amount of custom parameters, where the source codec remains embedded and unchanged during the adaptation process. Experiments demonstrate its effectiveness and provide useful insights on the characteristics of catastrophic interference in NIC.
Address	Virtual; June 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 600.120; 600.141; 601.379			Approved	no
Call Number	Admin @ si @ KHY2021			Serial	3568
Permanent link to this record



Author	Kai Wang; Luis Herranz; Joost Van de Weijer
Title	Continual learning in cross-modal retrieval			Type	Conference Article
Year	2021	Publication	2nd CLVISION workshop	Abbreviated Journal
Volume		Issue		Pages	3628-3638
Keywords
Abstract	Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and integrated (we focus on cross-modal retrieval between language and visual representations). The latter studies how to prevent forgetting a previously learned task when learning a new one. While humans excel in these two aspects, deep neural networks are still quite limited. In this paper, we propose a combination of both problems into a continual cross-modal retrieval setting, where we study how the catastrophic interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval. We propose a general framework that decouples the training, indexing and querying stages. We also identify and study different factors that may lead to forgetting, and propose tools to alleviate it. We found that the indexing stage pays an important role and that simply avoiding reindexing the database with updated embedding networks can lead to significant gains. We evaluated our methods in two image-text retrieval datasets, obtaining significant gains with respect to the fine tuning baseline.
Address	Virtual; June 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 600.120; 600.141; 600.147; 601.379			Approved	no
Call Number	Admin @ si @ WHW2021			Serial	3566
Permanent link to this record



Author	Aymen Azaza; Joost Van de Weijer; Ali Douik; Javad Zolfaghari Bengar; Marc Masana
Title	Saliency from High-Level Semantic Image Features			Type	Journal
Year	2020	Publication	SN Computer Science	Abbreviated Journal	SN
Volume	1	Issue	4	Pages	1-12
Keywords
Abstract	Top-down semantic information is known to play an important role in assigning saliency. Recently, large strides have been made in improving state-of-the-art semantic image understanding in the fields of object detection and semantic segmentation. Therefore, since these methods have now reached a high-level of maturity, evaluation of the impact of high-level image understanding on saliency estimation is now feasible. We propose several saliency features which are computed from object detection and semantic segmentation results. We combine these features with a standard baseline method for saliency detection to evaluate their importance. Experiments demonstrate that the proposed features derived from object detection and semantic segmentation improve saliency estimation significantly. Moreover, they show that our method obtains state-of-the-art results on (FT, ImgSal, and SOD datasets) and obtains competitive results on four other datasets (ECSSD, PASCAL-S, MSRA-B, and HKU-IS).
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120; 600.109; 600.106			Approved	no
Call Number	Admin @ si @ AWD2020			Serial	3503
Permanent link to this record



Author	Aymen Azaza
Title	Context, Motion and Semantic Information for Computational Saliency			Type	Book Whole
Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The main objective of this thesis is to highlight the salient object in an image or in a video sequence. We address three important—but in our opinion insufficiently investigated—aspects of saliency detection. Firstly, we start by extending previous research on saliency which explicitly models the information provided from the context. Then, we show the importance of explicit context modelling for saliency estimation. Several important works in saliency are based on the usage of object proposals. However, these methods focus on the saliency of the object proposal itself and ignore the context. To introduce context in such saliency approaches, we couple every object proposal with its direct context. This allows us to evaluate the importance of the immediate surround (context) for its saliency. We propose several saliency features which are computed from the context proposals including features based on omni-directional and horizontal context continuity. Secondly, we investigate the usage of top-downmethods (high-level semantic information) for the task of saliency prediction since most computational methods are bottom-up or only include few semantic classes. We propose to consider a wider group of object classes. These objects represent important semantic information which we will exploit in our saliency prediction approach. Thirdly, we develop a method to detect video saliency by computing saliency from supervoxels and optical flow. In addition, we apply the context features developed in this thesis for video saliency detection. The method combines shape and motion features with our proposed context features. To summarize, we prove that extending object proposals with their direct context improves the task of saliency detection in both image and video data. Also the importance of the semantic information in saliency estimation is evaluated. Finally, we propose a newmotion feature to detect saliency in video data. The three proposed novelties are evaluated on standard saliency benchmark datasets and are shown to improve with respect to state-of-the-art.
Address	October 2018
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Joost Van de Weijer;Ali Douik
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-945373-9-4	Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ Aza2018			Serial	3218
Permanent link to this record



Author	Xinhang Song; Shuqiang Jiang; Luis Herranz
Title	Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold			Type	Journal Article
Year	2017	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
Volume	26	Issue	6	Pages	2721-2735
Keywords
Abstract	Before the big data era, scene recognition was often approached with two-step inference using localized intermediate representations (objects, topics, and so on). One of such approaches is the semantic manifold (SM), in which patches and images are modeled as points in a semantic probability simplex. Patch models are learned resorting to weak supervision via image labels, which leads to the problem of scene categories co-occurring in this semantic space. Fortunately, each category has its own co-occurrence patterns that are consistent across the images in that category. Thus, discovering and modeling these patterns are critical to improve the recognition performance in this representation. Since the emergence of large data sets, such as ImageNet and Places, these approaches have been relegated in favor of the much more powerful convolutional neural networks (CNNs), which can automatically learn multi-layered representations from the data. In this paper, we address many limitations of the original SM approach and related works. We propose discriminative patch representations using neural networks and further propose a hybrid architecture in which the semantic manifold is built on top of multiscale CNNs. Both representations can be computed significantly faster than the Gaussian mixture models of the original SM. To combine multiple scales, spatial relations, and multiple features, we formulate rich context models using Markov random fields. To solve the optimization problem, we analyze global and local approaches, where a top-down hierarchical algorithm has the best performance. Experimental results show that exploiting different types of contextual relations jointly consistently improves the recognition accuracy.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ SJH2017a			Serial	2963
Permanent link to this record



Author	Weiqing Min; Shuqiang Jiang; Jitao Sang; Huayang Wang; Xinda Liu; Luis Herranz
Title	Being a Supercook: Joint Food Attributes and Multimodal Content Modeling for Recipe Retrieval and Exploration			Type	Journal Article
Year	2017	Publication	IEEE Transactions on Multimedia	Abbreviated Journal	TMM
Volume	19	Issue	5	Pages	1100 - 1113
Keywords
Abstract	This paper considers the problem of recipe-oriented image-ingredient correlation learning with multi-attributes for recipe retrieval and exploration. Existing methods mainly focus on food visual information for recognition while we model visual information, textual content (e.g., ingredients), and attributes (e.g., cuisine and course) together to solve extended recipe-oriented problems, such as multimodal cuisine classification and attribute-enhanced food image retrieval. As a solution, we propose a multimodal multitask deep belief network (M3TDBN) to learn joint image-ingredient representation regularized by different attributes. By grouping ingredients into visible ingredients (which are visible in the food image, e.g., “chicken” and “mushroom”) and nonvisible ingredients (e.g., “salt” and “oil”), M3TDBN is capable of learning both midlevel visual representation between images and visible ingredients and nonvisual representation. Furthermore, in order to utilize different attributes to improve the intermodality correlation, M3TDBN incorporates multitask learning to make different attributes collaborate each other. Based on the proposed M3TDBN, we exploit the derived deep features and the discovered correlations for three extended novel applications: 1) multimodal cuisine classification; 2) attribute-augmented cross-modal recipe image retrieval; and 3) ingredient and attribute inference from food images. The proposed approach is evaluated on the constructed Yummly dataset and the evaluation results have validated the effectiveness of the proposed approach.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ MJS2017			Serial	2964
Permanent link to this record



Author	Luis Herranz; Shuqiang Jiang; Ruihan Xu
Title	Modeling Restaurant Context for Food Recognition			Type	Journal Article
Year	2017	Publication	IEEE Transactions on Multimedia	Abbreviated Journal	TMM
Volume	19	Issue	2	Pages	430 - 440
Keywords
Abstract	Food photos are widely used in food logs for diet monitoring and in social networks to share social and gastronomic experiences. A large number of these images are taken in restaurants. Dish recognition in general is very challenging, due to different cuisines, cooking styles, and the intrinsic difficulty of modeling food from its visual appearance. However, contextual knowledge can be crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and location of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then, we reformulate the problem using a probabilistic model connecting dishes, restaurants, and locations. We apply that model in three different tasks: dish recognition, restaurant recognition, and location refinement. Experiments on six datasets show that by integrating multiple evidences (visual, location, and external knowledge) our system can boost the performance in all tasks.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.120			Approved	no
Call Number	Admin @ si @ HJX2017			Serial	2965
Permanent link to this record