|
Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Hatem A. Rashwan, Estefania Talavera, Syeda Furruka Banu, Petia Radeva, et al. (2018). MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams. In European Conference on Computer Vision workshops (pp. 423–433). LCNS.
Abstract: First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called “EgoFoodPlaces”. Experimental results shows promising results of food places classification recognition in egocentric photo-streams.
|
|
|
Xavier Soria, & Angel Sappa. (2018). Improving Edge Detection in RGB Images by Adding NIR Channel. In 14th IEEE International Conference on Signal Image Technology & Internet Based System.
Abstract: The edge detection is yet a critical problem in many computer vision and image processing tasks. The manuscript presents an Holistically-Nested Edge Detection based approach to study the inclusion of Near-Infrared in the Visible spectrum
images. To do so, a Single Sensor based dataset has been acquired in the range of 400nm to 1100nm wavelength spectral band. Prominent results have been obtained even when the ground truth (annotated edge-map) is based in the visible wavelength spectrum.
Keywords: Edge detection; Contour detection; VGG; CNN; RGB-NIR; Near infrared images
|
|
|
Patricia Suarez, Angel Sappa, & Boris X. Vintimilla. (2018). Cross-spectral image dehaze through a dense stacked conditional GAN based approach. In 14th IEEE International Conference on Signal Image Technology & Internet Based System.
Abstract: This paper proposes a novel approach to remove haze from RGB images using a near infrared images based on a dense stacked conditional Generative Adversarial Network (CGAN). The architecture of the deep network implemented
receives, besides the images with haze, its corresponding image in the near infrared spectrum, which serve to accelerate the learning process of the details of the characteristics of the images. The model uses a triplet layer that allows the independence learning of each channel of the visible spectrum image to remove the haze on each color channel separately. A multiple loss function scheme is proposed, which ensures balanced learning between the colors
and the structure of the images. Experimental results have shown that the proposed method effectively removes the haze from the images. Additionally, the proposed approach is compared with a state of the art approach showing better results.
Keywords: Infrared imaging; Dense; Stacked CGAN; Crossspectral; Convolutional networks
|
|
|
Jorge Charco, Boris X. Vintimilla, & Angel Sappa. (2018). Deep learning based camera pose estimation in multi-view environment. In 14th IEEE International Conference on Signal Image Technology & Internet Based System.
Abstract: This paper proposes to use a deep learning network architecture for relative camera pose estimation on a multi-view environment. The proposed network is a variant architecture of AlexNet to use as regressor for prediction the relative translation and rotation as output. The proposed approach is trained from
scratch on a large data set that takes as input a pair of imagesfrom the same scene. This new architecture is compared with a previous approach using standard metrics, obtaining better results on the relative camera pose.
Keywords: Deep learning; Camera pose estimation; Multiview environment; Siamese architecture
|
|
|
Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, & Sergio Escalera. (2018). Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues. In 29th British Machine Vision Conference.
Abstract: Gaze behavior is an important non-verbal cue in social signal processing and humancomputer interaction. In this paper, we tackle the problem of person- and head poseindependent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on
EYEDIAP dataset, further improved by 4% when the temporal modality is included.
|
|
|
Gabriela Ramirez, Esau Villatoro, Bogdan Ionescu, Hugo Jair Escalante, Sergio Escalera, Martha Larson, et al. (2018). Overview of the Multimedia Information Processing for Personality & Social Networks Analysis Contes. In Multimedia Information Processing for Personality and Social Networks Analysis (MIPPSNA 2018).
|
|
|
Ester Fornells, Manuel De Armas, Maria Teresa Anguera, Sergio Escalera, Marcos Antonio Catalán, & Josep Moya. (2018). Desarrollo del proyecto del Consell Comarcal del Baix Llobregat “Buen Trato a las personas mayores y aquellas en situación de fragilidad con sufrimiento emocional: Hacia un envejecimiento saludable”. Informaciones Psiquiatricas, 47–59.
|
|
|
Suman Ghosh. (2018). Word Spotting and Recognition in Images from Heterogeneous Sources A (Ernest Valveny, Ed.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
|
|
|
Arnau Baro, Pau Riba, & Alicia Fornes. (2018). A Starting Point for Handwritten Music Recognition. In 1st International Workshop on Reading Music Systems (pp. 5–6).
Abstract: In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Keywords: Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
|
|
|
Laura Lopez-Fuentes, Alessandro Farasin, Harald Skinnemoen, & Paolo Garza. (2018). Deep Learning models for passability detection of flooded roads. In MediaEval 2018 Multimedia Benchmark Workshop (Vol. 2283).
Abstract: In this paper we study and compare several approaches to detect floods and evidence for passability of roads by conventional means in Twitter. We focus on tweets containing both visual information (a picture shared by the user) and metadata, a combination of text and related extra information intrinsic to the Twitter API. This work has been done in the context of the MediaEval 2018 Multimedia Satellite Task.
|
|
|
Md. Mostafa Kamal Sarker, Mohammed Jabreel, Hatem A. Rashwan, Syeda Furruka Banu, Antonio Moreno, Petia Radeva, et al. (2018). CuisineNet: Food Attributes Classification using Multi-scale Convolution Network..
Abstract: Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.
|
|
|
Marçal Rusiñol, & Lluis Gomez. (2018). Avances en clasificación de imágenes en los últimos diez años. Perspectivas y limitaciones en el ámbito de archivos fotográficos históricos. Revista anual de la Asociación de Archiveros de Castilla y León, 161–174.
|
|
|
Ozan Caglayan, Adrien Bardet, Fethi Bougares, Loic Barrault, Kai Wang, Marc Masana, et al. (2018). LIUM-CVC Submissions for WMT18 Multimodal Translation Task. In 3rd Conference on Machine Translation.
Abstract: This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previou multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions
ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.
|
|
|
Hugo Prol, Vincent Dumoulin, & Luis Herranz. (2018). Cross-Modulation Networks for Few-Shot Learning.
Abstract: A family of recent successful approaches to few-shot learning relies on learning an embedding space in which predictions are made by computing similarities between examples. This corresponds to combining information between support and query examples at a very late stage of the prediction pipeline. Inspired by this observation, we hypothesize that there may be benefits to combining the information at various levels of abstraction along the pipeline. We present an architecture called Cross-Modulation Networks which allows support and query examples to interact throughout the feature extraction process via a feature-wise modulation mechanism. We adapt the Matching Networks architecture to take advantage of these interactions and show encouraging initial results on miniImageNet in the 5-way, 1-shot setting, where we close the gap with state-of-the-art.
|
|
|
Chenshen Wu, Luis Herranz, Xialei Liu, Joost Van de Weijer, & Bogdan Raducanu. (2018). Memory Replay GANs: Learning to Generate New Categories without Forgetting. In 32nd Annual Conference on Neural Information Processing Systems (pp. 5966–5976).
Abstract: Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (ie forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories.
|
|