|   | 
Details
   web
Record
Author Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
Title Seeing and Hearing Egocentric Actions: How Much Can We Learn? Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 4470-4480
Keywords (up)
Abstract Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
Address Seul; Korea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes MILAB; no proj Approved no
Call Number Admin @ si @ CLR2019b Serial 3385
Permanent link to this record