Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–11] |
Records | |||||
---|---|---|---|---|---|
Author | Yagmur Gucluturk; Umut Guclu; Xavier Baro; Hugo Jair Escalante; Isabelle Guyon; Sergio Escalera; Marcel A. J. van Gerven; Rob van Lier | ||||
Title | Multimodal First Impression Analysis with Deep Residual Networks | Type | Journal Article | ||
Year | 2018 | Publication | IEEE Transactions on Affective Computing | Abbreviated Journal | TAC |
Volume | 8 | Issue | 3 | Pages | 316-329 |
Keywords | |||||
Abstract | People form first impressions about the personalities of unfamiliar individuals even after very brief interactions with them. In this study we present and evaluate several models that mimic this automatic social behavior. Specifically, we present several models trained on a large dataset of short YouTube video blog posts for predicting apparent Big Five personality traits of people and whether they seem suitable to be recommended to a job interview. Along with presenting our audiovisual approach and results that won the third place in the ChaLearn First Impressions Challenge, we investigate modeling in different modalities including audio only, visual only, language only, audiovisual, and combination of audiovisual and language. Our results demonstrate that the best performance could be obtained using a fusion of all data modalities. Finally, in order to promote explainability in machine learning and to provide an example for the upcoming ChaLearn challenges, we present a simple approach for explaining the predictions for job interview recommendations | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ GGB2018 | Serial | 3210 | ||
Permanent link to this record | |||||
Author | Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas | ||||
Title | Learning to Learn from Web Data through Deep Semantic Embeddings | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | 11134 | Issue | Pages | 514-529 | |
Keywords | |||||
Abstract | In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings. | ||||
Address | Munich; Alemanya; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCVW | ||
Notes | DAG; 600.129; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GGG2018a | Serial | 3175 | ||
Permanent link to this record | |||||
Author | Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas | ||||
Title | Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | 11134 | Issue | Pages | 530-544 | |
Keywords | |||||
Abstract | Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis. | ||||
Address | Munich; Alemanya; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCVW | ||
Notes | DAG; 600.129; 601.338; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GGG2018b | Serial | 3176 | ||
Permanent link to this record | |||||
Author | Suman Ghosh | ||||
Title | Word Spotting and Recognition in Images from Heterogeneous Sources A | Type | Book Whole | ||
Year | 2018 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations. | ||||
Address | November 2018 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Ernest Valveny | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-948531-0-4 | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ Gho2018 | Serial | 3217 | ||
Permanent link to this record | |||||
Author | Adrien Gaidon; Antonio Lopez; Florent Perronnin | ||||
Title | The Reasonable Effectiveness of Synthetic Visual Data | Type | Journal Article | ||
Year | 2018 | Publication | International Journal of Computer Vision | Abbreviated Journal | IJCV |
Volume | 126 | Issue | 9 | Pages | 899–901 |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ GLP2018 | Serial | 3180 | ||
Permanent link to this record | |||||
Author | Jianzhy Guo; Zhen Lei; Jun Wan; Egils Avots; Noushin Hajarolasvadi; Boris Knyazev; Artem Kuharenko; Julio C. S. Jacques Junior; Xavier Baro; Hasan Demirel; Sergio Escalera; Juri Allik; Gholamreza Anbarjafari | ||||
Title | Dominant and Complementary Emotion Recognition from Still Images of Faces | Type | Journal Article | ||
Year | 2018 | Publication | IEEE Access | Abbreviated Journal | ACCESS |
Volume | 6 | Issue | Pages | 26391 - 26403 | |
Keywords | |||||
Abstract | Emotion recognition has a key role in affective computing. Recently, fine-grained emotion analysis, such as compound facial expression of emotions, has attracted high interest of researchers working on affective computing. A compound facial emotion includes dominant and complementary emotions (e.g., happily-disgusted and sadly-fearful), which is more detailed than the seven classical facial emotions (e.g., happy, disgust, and so on). Current studies on compound emotions are limited to use data sets with limited number of categories and unbalanced data distributions, with labels obtained automatically by machine learning-based algorithms which could lead to inaccuracies. To address these problems, we released the iCV-MEFED data set, which includes 50 classes of compound emotions and labels assessed by psychologists. The task is challenging due to high similarities of compound facial emotions from different categories. In addition, we have organized a challenge based on the proposed iCV-MEFED data set, held at FG workshop 2017. In this paper, we analyze the top three winner methods and perform further detailed experiments on the proposed data set. Experiments indicate that pairs of compound emotion (e.g., surprisingly-happy vs happily-surprised) are more difficult to be recognized if compared with the seven basic emotions. However, we hope the proposed data set can help to pave the way for further research on compound facial emotion recognition. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ GLW2018 | Serial | 3122 | ||
Permanent link to this record | |||||
Author | Abel Gonzalez-Garcia; Davide Modolo; Vittorio Ferrari | ||||
Title | Objects as context for detecting their semantic parts | Type | Conference Article | ||
Year | 2018 | Publication | 31st IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 6907 - 6916 | ||
Keywords | Proposals; Semantics; Wheels; Automobiles; Context modeling; Task analysis; Object detection | ||||
Abstract | We present a semantic part detection approach that effectively leverages object information. We use the object appearance and its class as indicators of what parts to expect. We also model the expected relative location of parts inside the objects based on their appearance. We achieve this with a new network module, called OffsetNet, that efficiently predicts a variable number of part locations within a given object. Our model incorporates all these cues to
detect parts in the context of their objects. This leads to considerably higher performance for the challenging task of part detection compared to using part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to other part detection methods on both PASCAL-Part and CUB200-2011 datasets. |
||||
Address | Salt Lake City; USA; June 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.109; 600.120 | Approved | no | ||
Call Number | Admin @ si @ GMF2018 | Serial | 3229 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | Single Shot Scene Text Retrieval | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11218 | Issue | Pages | 728-744 | |
Keywords | Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC | ||||
Abstract | Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed. |
||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; 600.084; 601.338; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GMR2018 | Serial | 3143 | ||
Permanent link to this record | |||||
Author | Debora Gil; Rosa Maria Ortiz; Carles Sanchez; Antoni Rosell | ||||
Title | Objective endoscopic measurements of central airway stenosis. A pilot study | Type | Journal Article | ||
Year | 2018 | Publication | Respiration | Abbreviated Journal | RES |
Volume | 95 | Issue | Pages | 63–69 | |
Keywords | Bronchoscopy; Tracheal stenosis; Airway stenosis; Computer-assisted analysis | ||||
Abstract | Endoscopic estimation of the degree of stenosis in central airway obstruction is subjective and highly variable. Objective: To determine the benefits of using SENSA (System for Endoscopic Stenosis Assessment), an image-based computational software, for obtaining objective stenosis index (SI) measurements among a group of expert bronchoscopists and general pulmonologists. Methods: A total of 7 expert bronchoscopists and 7 general pulmonologists were enrolled to validate SENSA usage. The SI obtained by the physicians and by SENSA were compared with a reference SI to set their precision in SI computation. We used SENSA to efficiently obtain this reference SI in 11 selected cases of benign stenosis. A Web platform with three user-friendly microtasks was designed to gather the data. The users had to visually estimate the SI from videos with and without contours of the normal and the obstructed area provided by SENSA. The users were able to modify the SENSA contours to define the reference SI using morphometric bronchoscopy. Results: Visual SI estimation accuracy was associated with neither bronchoscopic experience (p = 0.71) nor the contours of the normal and the obstructed area provided by the system (p = 0.13). The precision of the SI by SENSA was 97.7% (95% CI: 92.4-103.7), which is significantly better than the precision of the SI by visual estimation (p < 0.001), with an improvement by at least 15%. Conclusion: SENSA provides objective SI measurements with a precision of up to 99.5%, which can be calculated from any bronchoscope using an affordable scalable interface. Providing normal and obstructed contours on bronchoscopic videos does not improve physicians' visual estimation of the SI. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; 600.075; 600.096; 600.145 | Approved | no | ||
Call Number | Admin @ si @ GOS2018 | Serial | 3043 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Marçal Rusiñol; Ali Furkan Biten; Dimosthenis Karatzas | ||||
Title | Subtitulació automàtica d'imatges. Estat de l'art i limitacions en el context arxivístic | Type | Conference Article | ||
Year | 2018 | Publication | Jornades Imatge i Recerca | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | JIR | ||
Notes | DAG; 600.084; 600.135; 601.338; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GRB2018 | Serial | 3173 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters | Type | Conference Article | ||
Year | 2018 | Publication | 13th IAPR International Workshop on Document Analysis Systems | Abbreviated Journal | |
Volume | Issue | Pages | 97-102 | ||
Keywords | Robust Reading; End-to-end Systems; CNN; Utility Meters | ||||
Abstract | In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as
dirt, illumination highlights or blur. |
||||
Address | Viena; Austria; April 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | DAS | ||
Notes | DAG; 600.084; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GRK2018 | Serial | 3102 | ||
Permanent link to this record | |||||
Author | Akhil Gurram; Onay Urfalioglu; Ibrahim Halfaoui; Fahd Bouzaraa; Antonio Lopez | ||||
Title | Monocular Depth Estimation by Learning from Heterogeneous Datasets | Type | Conference Article | ||
Year | 2018 | Publication | IEEE Intelligent Vehicles Symposium | Abbreviated Journal | |
Volume | Issue | Pages | 2176 - 2181 | ||
Keywords | |||||
Abstract | Depth estimation provides essential information to perform autonomous driving and driver assistance. Especially, Monocular Depth Estimation is interesting from a practical point of view, since using a single camera is cheaper than many other options and avoids the need for continuous calibration strategies as required by stereo-vision approaches. State-of-the-art methods for Monocular Depth Estimation are based on Convolutional Neural Networks (CNNs). A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels, which usually are difficult to annotate (eg crowded urban images). Moreover, so far it is common practice to assume that the same raw training data is associated with both types of ground truth, ie, depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, ie, that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on Monocular Depth Estimation. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | IV | ||
Notes | ADAS; 600.124; 600.116; 600.118 | Approved | no | ||
Call Number | Admin @ si @ GUH2018 | Serial | 3183 | ||
Permanent link to this record | |||||
Author | Abel Gonzalez-Garcia; Joost Van de Weijer; Yoshua Bengio | ||||
Title | Image-to-image translation for cross-domain disentanglement | Type | Conference Article | ||
Year | 2018 | Publication | 32nd Annual Conference on Neural Information Processing Systems | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Montreal; Canada; December 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | NIPS | ||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ GWB2018 | Serial | 3155 | ||
Permanent link to this record | |||||
Author | Mohammad A. Haque; Ruben B. Bautista; Kamal Nasrollahi; Sergio Escalera; Christian B. Laursen; Ramin Irani; Ole K. Andersen; Erika G. Spaich; Kaustubh Kulkarni; Thomas B. Moeslund; Marco Bellantonio; Golamreza Anbarjafari; Fatemeh Noroozi | ||||
Title | Deep Multimodal Pain Recognition: A Database and Comparision of Spatio-Temporal Visual Modalities, Faces and Gestures | Type | Conference Article | ||
Year | 2018 | Publication | 13th IEEE Conference on Automatic Face and Gesture Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 250 - 257 | ||
Keywords | |||||
Abstract | Pain is a symptom of many disorders associated with actual or potential tissue damage in human body. Managing pain is not only a duty but also highly cost prone. The most primitive state of pain management is the assessment of pain. Traditionally it was accomplished by self-report or visual inspection by experts. However, automatic pain assessment systems from facial videos are also rapidly evolving due to the need of managing pain in a robust and cost effective way. Among different challenges of automatic pain assessment from facial video data two issues are increasingly prevalent: first, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data on shallow learning scenarios. However, employing deep learning techniques for spatio-temporal analysis considering Depth (D) and Thermal (T) along with RGB has high potential in this area. In this paper, we present the first state-of-the-art publicly available database, 'Multimodal Intensity Pain (MIntPAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate. | ||||
Address | Xian; China; May 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | FG | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ HBN2018 | Serial | 3117 | ||
Permanent link to this record | |||||
Author | Rain Eric Haamer; Kaustubh Kulkarni; Nasrin Imanpour; Mohammad Ahsanul Haque; Egils Avots; Michelle Breisch; Kamal Nasrollahi; Sergio Escalera; Cagri Ozcinar; Xavier Baro; Ahmad R. Naghsh-Nilchi; Thomas B. Moeslund; Gholamreza Anbarjafari | ||||
Title | Changes in Facial Expression as Biometric: A Database and Benchmarks of Identification | Type | Conference Article | ||
Year | 2018 | Publication | 8th International Workshop on Human Behavior Understanding | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Facial dynamics can be considered as unique signatures for discrimination between people. These have started to become important topic since many devices have the possibility of unlocking using face recognition or verification. In this work, we evaluate the efficacy of the transition frames of video in emotion as compared to the peak emotion frames for identification. For experiments with transition frames we extract features from each frame of the video from a fine-tuned VGG-Face Convolutional Neural Network (CNN) and geometric features from facial landmark points. To model the temporal context of the transition frames we train a Long-Short Term Memory (LSTM) on the geometric and the CNN features. Furthermore, we employ two fusion strategies: first, an early fusion, in which the geometric and the CNN features are stacked and fed to the LSTM. Second, a late fusion, in which the prediction of the LSTMs, trained independently on the two features, are stacked and used with a Support Vector Machine (SVM). Experimental results show that the late fusion strategy gives the best results and the transition frames give better identification results as compared to the peak emotion frames. | ||||
Address | Xian; China; May 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | FGW | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ HKI2018 | Serial | 3118 | ||
Permanent link to this record |