Publicacions CVC -- Query Results

[111–120] << 121 122 123 124 125 126 127 128 129 130 >> [131–140]

Details

Records
Author	Daniel Sanchez; Miguel Angel Bautista; Sergio Escalera
Title	HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs			Type	Journal Article
Year	2015	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
Volume	150	Issue	A	Pages	173–188
Keywords	Human limb segmentation; ECOC; Graph-Cuts
Abstract	Human multi-limb segmentation in RGB images has attracted a lot of interest in the research community because of the huge amount of possible applications in fields like Human-Computer Interaction, Surveillance, eHealth, or Gaming. Nevertheless, human multi-limb segmentation is a very hard task because of the changes in appearance produced by different points of view, clothing, lighting conditions, occlusions, and number of articulations of the human body. Furthermore, this huge pose variability makes the availability of large annotated datasets difficult. In this paper, we introduce the HuPBA8k+ dataset. The dataset contains more than 8000 labeled frames at pixel precision, including more than 120000 manually labeled samples of 14 different limbs. For completeness, the dataset is also labeled at frame-level with action annotations drawn from an 11 action dictionary which includes both single person actions and person-person interactive actions. Furthermore, we also propose a two-stage approach for the segmentation of human limbs. In a first stage, human limbs are trained using cascades of classifiers to be split in a tree-structure way, which is included in an Error-Correcting Output Codes (ECOC) framework to define a body-like probability map. This map is used to obtain a binary mask of the subject by means of GMM color modelling and GraphCuts theory. In a second stage, we embed a similar tree-structure in an ECOC framework to build a more accurate set of limb-like probability maps within the segmented user mask, that are fed to a multi-label GraphCut procedure to obtain final multi-limb segmentation. The methodology is tested on the novel HuPBA8k+ dataset, showing performance improvements in comparison to state-of-the-art approaches. In addition, a baseline of standard action recognition methods for the 11 actions categories of the novel dataset is also provided.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA;MILAB			Approved	no
Call Number	Admin @ si @ SBE2015			Serial	2552
Permanent link to this record



Author	Wenjuan Gong; Xuena Zhang; Jordi Gonzalez; Andrews Sobral; Thierry Bouwmans; Changhe Tu; El-hadi Zahzah
Title	Human Pose Estimation from Monocular Images: A Comprehensive Survey			Type	Journal Article
Year	2016	Publication	Sensors	Abbreviated Journal	SENS
Volume	16	Issue	12	Pages	1966
Keywords	human pose estimation; human bodymodels; generativemethods; discriminativemethods; top-down methods; bottom-up methods
Abstract	Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.098; 600.119			Approved	no
Call Number	Admin @ si @ GZG2016			Serial	2933
Permanent link to this record



Author	Xavier Perez Sala; Sergio Escalera; Cecilio Angulo; Jordi Gonzalez
Title	A survey on model based approaches for 2D and 3D visual human pose recovery			Type	Journal Article
Year	2014	Publication	Sensors	Abbreviated Journal	SENS
Volume	14	Issue	3	Pages	4189-4210
Keywords	human pose recovery; human body modelling; behavior analysis; computer vision
Abstract	Human Pose Recovery has been studied in the field of Computer Vision for the last 40 years. Several approaches have been reported, and significant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we define a general taxonomy to group model based approaches for Human Pose Recovery, which is composed of five main modules: appearance, viewpoint, spatial relations, temporal consistence, and behavior. Subsequently, a methodological comparison is performed following the proposed taxonomy, evaluating current SoA approaches in the aforementioned five group categories. As a result of this comparison, we discuss the main advantages and drawbacks of the reviewed literature.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; ISE; 600.046; 600.063; 600.078;MILAB			Approved	no
Call Number	Admin @ si @ PEA2014			Serial	2443
Permanent link to this record



Author	Eloi Puertas; Miguel Angel Bautista; Daniel Sanchez; Sergio Escalera; Oriol Pujol
Title	Learning to Segment Humans by Stacking their Body Parts,			Type	Conference Article
Year	2014	Publication	ECCV Workshop on ChaLearn Looking at People	Abbreviated Journal
Volume	8925	Issue		Pages	685-697
Keywords	Human body segmentation; Stacked Sequential Learning
Abstract	Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body part likelihood maps. These likelihood maps are obtained in a first stage by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ECCVW
Notes	HuPBA;MILAB			Approved	no
Call Number	Admin @ si @ PBS2014			Serial	2553
Permanent link to this record



Author	Armin Mehri
Title	Deep learning based architectures for cross-domain image processing			Type	Book Whole
Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Human vision is restricted to the visual-optical spectrum. Machine vision is not. Cameras sensitive to diverse infrared spectral bands can improve the capacities of autonomous systems and provide a comprehensive view. Relevant scene content can be made visible, particularly in situations when sensors of other modalities, such as a visual-optical camera, require a source of illumination. As a result, increasing the level of automation not only avoids human errors but also reduces machine-induced errors. Furthermore, multi-spectral sensor systems with infrared imagery as one modality are a rich source of information and can conceivably increase the robustness of many autonomous systems. Robotics, automobiles, biometrics, security, surveillance, and the military are some examples of fields that can profit from the use of infrared imagery in their respective applications. Although multimodal spectral sensors have come a long way, there are still several bottlenecks that prevent us from combining their output information and using them as comprehensive images. The primary issue with infrared imaging is the lack of potential benefits due to their cost influence on sensor resolution, which grows exponentially with greater resolution. Due to the more costly sensor technology required for their development, their resolutions are substantially lower than thoseof regular digital cameras. This thesis aims to improve beyond-visible-spectrum machine vision by integrating multi-modal spectral sensors. The emphasis is on transforming the produced images to enhance their resolution to match expected human perception, bring the color representation close to human understanding of natural color, and improve machine vision application performance. This research focuses mainly on two tasks, image Colorization and Image Super resolution for both single- and cross-domain problems. We first start with an extensive review of the state of the art in both tasks, point out the shortcomings of existing approaches, and then present our solutions to address their limitations. Our solutions demonstrate that low-cost channel information (i.e., visible image) can be used to improve expensive channel information (i.e., infrared image), resulting in images with higher quality and closer to human perception at a lower cost than a high-cost infrared camera.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Angel Sappa
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-126409-1-5	Medium
Area		Expedition		Conference
Notes	MSIAU			Approved	no
Call Number	Admin @ si @ Meh2023			Serial	3959
Permanent link to this record



Author	Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title	Computer Vision in the Infrared Spectrum: Challenges and Approaches			Type	Book Whole
Year	2021	Publication	Synthesis Lectures on Computer Vision	Abbreviated Journal
Volume	10	Issue	2	Pages	1-138
Keywords
Abstract	Human visual perception is limited to the visual-optical spectrum. Machine vision is not. Cameras sensitive to the different infrared spectra can enhance the abilities of autonomous systems and visually perceive the environment in a holistic way. Relevant scene content can be made visible especially in situations, where sensors of other modalities face issues like a visual-optical camera that needs a source of illumination. As a consequence, not only human mistakes can be avoided by increasing the level of automation, but also machine-induced errors can be reduced that, for example, could make a self-driving car crash into a pedestrian under difficult illumination conditions. Furthermore, multi-spectral sensor systems with infrared imagery as one modality are a rich source of information and can provably increase the robustness of many autonomous systems. Applications that can benefit from utilizing infrared imagery range from robotics to automotive and from biometrics to surveillance. In this book, we provide a brief yet concise introduction to the current state-of-the-art of computer vision and machine learning in the infrared spectrum. Based on various popular computer vision tasks such as image enhancement, object detection, or object tracking, we first motivate each task starting from established literature in the visual-optical spectrum. Then, we discuss the differences between processing images and videos in the visual-optical spectrum and the various infrared spectra. An overview of the current literature is provided together with an outlook for each task. Furthermore, available and annotated public datasets and common evaluation methods and metrics are presented. In a separate chapter, popular applications that can greatly benefit from the use of infrared imagery as a data source are presented and discussed. Among them are automatic target recognition, video surveillance, or biometrics including face recognition. Finally, we conclude with recommendations for well-fitting sensor setups and data processing algorithms for certain computer vision tasks. We address this book to prospective researchers and engineers new to the field but also to anyone who wants to get introduced to the challenges and the approaches of computer vision using infrared images or videos. Readers will be able to start their work directly after reading the book supported by a highly comprehensive backlog of recent and relevant literature as well as related infrared datasets including existing evaluation frameworks. Together with consistently decreasing costs for infrared cameras, new fields of application appear and make computer vision in the infrared spectrum a great opportunity to face nowadays scientific and engineering challenges.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1636392431	Medium
Area		Expedition		Conference
Notes	MSIAU			Approved	no
Call Number	Admin @ si @ TSH2021			Serial	3666
Permanent link to this record



Author	Bogdan Raducanu; Fadi Dornaika
Title	Pose-Invariant Face Recognition in Videos for Human-Machine Interaction			Type	Conference Article
Year	2012	Publication	12th European Conference on Computer Vision	Abbreviated Journal
Volume	7584	Issue		Pages	566.575
Keywords
Abstract	Human-machine interaction is a hot topic nowadays in the communities of computer vision and robotics. In this context, face recognition algorithms (used as primary cue for a person’s identity assessment) work well under controlled conditions but degrade significantly when tested in real-world environments. This is mostly due to the difficulty of simultaneously handling variations in illumination, pose, and occlusions. In this paper, we propose a novel approach for robust pose-invariant face recognition for human-robot interaction based on the real-time fitting of a 3D deformable model to input images taken from video sequences. More concrete, our approach generates a rectified face image irrespective with the actual head-pose orientation. Experimental results performed on Honda video database, using several manifold learning techniques, show a distinct advantage of the proposed method over the standard 2D appearance-based snapshot approach.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-33867-0	Medium
Area		Expedition		Conference	ECCVW
Notes	OR;MV			Approved	no
Call Number	Admin @ si @ RaD2012e			Serial	2182
Permanent link to this record



Author	Adam Fodor; Rachid R. Saboundji; Julio C. S. Jacques Junior; Sergio Escalera; David Gallardo Pujol; Andras Lorincz
Title	Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures			Type	Conference Article
Year	2022	Publication	Understanding Social Behavior in Dyadic and Small Group Interactions	Abbreviated Journal
Volume	173	Issue		Pages	218-241
Keywords
Abstract	Human-machine, human-robot interaction, and collaboration appear in diverse fields, from homecare to Cyber-Physical Systems. Technological development is fast, whereas real-time methods for social communication analysis that can measure small changes in sentiment and personality states, including visual, acoustic and language modalities are lagging, particularly when the goal is to build robust, appearance invariant, and fair methods. We study and compare methods capable of fusing modalities while satisfying real-time and invariant appearance conditions. We compare state-of-the-art transformer architectures in sentiment estimation and introduce them in the much less explored field of personality perception. We show that the architectures perform differently on automatic sentiment and personality perception, suggesting that each task may be better captured/modeled by a particular method. Our work calls attention to the attractive properties of the linear versions of the transformer architectures. In particular, we show that the best results are achieved by fusing the different architectures{’} preprocessing methods. However, they pose extreme conditions in computation power and energy consumption for real-time computations for quadratic transformers due to their memory requirements. In turn, linear transformers pave the way for quantifying small changes in sentiment estimation and personality perception for real-time social communications for machines and robots.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	PMLR
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ FSJ2022			Serial	3769
Permanent link to this record



Author	Fadi Dornaika; Bogdan Raducanu; Alireza Bosaghzadeh
Title	Facial expression recognition based on multi observations with application to social robotics			Type	Book Chapter
Year	2015	Publication	Emotional and Facial Expressions: Recognition, Developmental Differences and Social Importance	Abbreviated Journal
Volume		Issue		Pages	153-166
Keywords
Abstract	Human-robot interaction is a hot topic nowadays in the social robotics community. One crucial aspect is represented by the affective communication which comes encoded through the facial expressions. In this chapter, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, viewand texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial expression.
Address
Corporate Author				Thesis
Publisher	Nova Science publishers	Place of Publication		Editor	Bruce Flores
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	OR;MV			Approved	no
Call Number	Admin @ si @ DRB2015			Serial	2720
Permanent link to this record



Author	Bogdan Raducanu; Alireza Bosaghzadeh; Fadi Dornaika
Title	Facial Expression Recognition based on Multi-view Observations with Application to Social Robotics			Type	Conference Article
Year	2014	Publication	1st Workshop on Computer Vision for Affective Computing	Abbreviated Journal
Volume		Issue		Pages	1-8
Keywords
Abstract	Human-robot interaction is a hot topic nowadays in the social robotics community. One crucial aspect is represented by the affective communication which comes encoded through the facial expressions. In this paper, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, view- and texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial expression.
Address	Singapore; November 2014
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ACCV
Notes	OR;MV			Approved	no
Call Number	Admin @ si @ RBD2014			Serial	2599
Permanent link to this record



Author	Xialei Liu; Chenshen Wu; Mikel Menta; Luis Herranz; Bogdan Raducanu; Andrew Bagdanov; Shangling Jui; Joost Van de Weijer
Title	Generative Feature Replay for Class-Incremental Learning			Type	Conference Article
Year	2020	Publication	CLVISION – Workshop on Continual Learning in Computer Vision	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Humans are capable of learning new tasks without forgetting previous ones, while neural networks fail due to catastrophic forgetting between new and previously-learned tasks. We consider a class-incremental setting which means that the task-ID is unknown at inference time. The imbalance between old and new classes typically results in a bias of the network towards the newest ones. This imbalance problem can either be addressed by storing exemplars from previous tasks, or by using image replay methods. However, the latter can only be applied to toy datasets since image generation for complex datasets is a hard problem. We propose a solution to the imbalance problem based on generative feature replay which does not require any exemplars. To do this, we split the network into two parts: a feature extractor and a classifier. To prevent forgetting, we combine generative feature replay in the classifier with feature distillation in the feature extractor. Through feature generation, our method reduces the complexity of generative replay and prevents the imbalance problem. Our approach is computationally efficient and scalable to large datasets. Experiments confirm that our approach achieves state-of-the-art results on CIFAR-100 and ImageNet, while requiring only a fraction of the storage needed for exemplar-based continual learning
Address	Virtual CVPR
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 601.309; 602.200; 600.141; 600.120			Approved	no
Call Number	Admin @ si @ LWM2020			Serial	3419
Permanent link to this record



Author	Murad Al Haj
Title	Looking at Faces: Detection, Tracking and Pose Estimation			Type	Book Whole
Year	2013	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Humans can effortlessly perceive faces, follow them over space and time, and decode their rich content, such as pose, identity and expression. However, despite many decades of research on automatic facial perception in areas like face detection, expression recognition, pose estimation and face recognition, and despite many successes, a complete solution remains elusive. This thesis is dedicated to three problems in automatic face perception, namely face detection, face tracking and pose estimation. In face detection, an initial simple model is presented that uses pixel-based heuristics to segment skin locations and hand-crafted rules to determine the locations of the faces present in an image. Different colorspaces are studied to judge whether a colorspace transformation can aid skin color detection. The output of this study is used in the design of a more complex face detector that is able to successfully generalize to different scenarios. In face tracking, a framework that combines estimation and control in a joint scheme is presented to track a face with a single pan-tilt-zoom camera. While this work is mainly motivated by tracking faces, it can be easily applied atop of any detector to track different objects. The applicability of this method is demonstrated on simulated as well as real-life scenarios. The last and most important part of this thesis is dedicate to monocular head pose estimation. In this part, a method based on partial least squares (PLS) regression is proposed to estimate pose and solve the alignment problem simultaneously. The contributions of this work are two-fold: 1) demonstrating that the proposed method achieves better than state-of-the-art results on the estimation problem and 2) developing a technique to reduce misalignment based on the learned PLS factors that outperform multiple instance learning (MIL) without the need for any re-training or the inclusion of misaligned samples in the training process, as normally done in MIL.
Address	Barcelona
Corporate Author				Thesis	Ph.D. thesis
Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Jordi Gonzalez;Xavier Roca
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ Haj2013			Serial	2278
Permanent link to this record



Author	Mario Rojas; David Masip; A. Todorov; Jordi Vitria
Title	Automatic Point-based Facial Trait Judgments Evaluation			Type	Conference Article
Year	2010	Publication	23rd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	2715–2720
Keywords
Abstract	Humans constantly evaluate the personalities of other people using their faces. Facial trait judgments have been studied in the psychological field, and have been determined to influence important social outcomes of our lives, such as elections outcomes and social relationships. Recent work on textual descriptions of faces has shown that trait judgments are highly correlated. Further, behavioral studies suggest that two orthogonal dimensions, valence and dominance, can describe the basis of the human judgments from faces. In this paper, we used a corpus of behavioral data of judgments on different trait dimensions to automatically learn a trait predictor from facial pixel images. We study whether trait evaluations performed by humans can be learned using machine learning classifiers, and used later in automatic evaluations of new facial images. The experiments performed using local point-based descriptors show promising results in the evaluation of the main traits.
Address	San Francisco CA, USA
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1063-6919	ISBN	978-1-4244-6984-0	Medium
Area		Expedition		Conference	CVPR
Notes	OR;MV			Approved	no
Call Number	BCNPCL @ bcnpcl @ RMT2010			Serial	1282
Permanent link to this record



Author	Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
Title	Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia			Type	Conference Article
Year	2023	Publication	Proceedings of the 37th AAAI Conference on Artificial Intelligence	Abbreviated Journal
Volume	37	Issue	2	Pages	1940-1948
Keywords
Abstract	Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.
Address	Washington; USA; February 2023
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	AAAI
Notes	DAG			Approved	no
Call Number	Admin @ si @ NBM2023			Serial	3860
Permanent link to this record



Author	Kaustubh Kulkarni; Ciprian Corneanu; Ikechukwu Ofodile; Sergio Escalera; Xavier Baro; Sylwia Hyniewska; Juri Allik; Gholamreza Anbarjafari
Title	Automatic Recognition of Facial Displays of Unfelt Emotions			Type	Journal Article
Year	2021	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
Volume	12	Issue	2	Pages	377 - 390
Keywords
Abstract	Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no proj			Approved	no
Call Number	Admin @ si @ KCO2021			Serial	3658
Permanent link to this record