Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	1486–1500 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[81–90] << 91 92 93 94 95 96 97 98 99 100 >> [101–110]

List View

Citations

Details

	Records
	Author	Alejandro Gonzalez Alzate; David Vazquez; Antonio Lopez; Jaume Amores
	Title	On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts			Type	Journal Article
	Year	2017	Publication	IEEE Transactions on cybernetics	Abbreviated Journal	Cyber
	Volume	47	Issue	11	Pages	3980 - 3990
	Keywords	Multicue; multimodal; multiview; object detection
	Abstract	Despite recent significant advances, object detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities, and a strong multiview (MV) classifier that accounts for different object views and poses. In this paper, we provide an extensive evaluation that gives insight into how each of these aspects (multicue, multimodality, and strong MV classifier) affect accuracy both individually and when integrated together. In the multimodality component, we explore the fusion of RGB and depth maps obtained by high-definition light detection and ranging, a type of modality that is starting to receive increasing attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the accuracy, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	2168-2267	ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.085; 600.082; 600.076; 600.118			Approved	no
	Call Number	Admin @ si @			Serial	2810
Permanent link to this record



	Author	Pau Rodriguez; Guillem Cucurull; Jordi Gonzalez; Josep M. Gonfaus; Kamal Nasrollahi; Thomas B. Moeslund; Xavier Roca
	Title	Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification			Type	Journal Article
	Year	2017	Publication	IEEE Transactions on cybernetics	Abbreviated Journal	Cyber
	Volume		Issue		Pages	1-11
	Keywords
	Abstract	Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE; 600.119; 600.098			Approved	no
	Call Number	Admin @ si @ RCG2017a			Serial	2926
Permanent link to this record



	Author	Jun Wan; Chi Lin; Longyin Wen; Yunan Li; Qiguang Miao; Sergio Escalera; Gholamreza Anbarjafari; Isabelle Guyon; Guodong Guo; Stan Z. Li
	Title	ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition			Type	Journal Article
	Year	2022	Publication	IEEE Transactions on Cybernetics	Abbreviated Journal	TCIBERN
	Volume	52	Issue	5	Pages	3422-3433
	Keywords
	Abstract	The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.
	Address	May 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ WLW2022			Serial	3522
Permanent link to this record



	Author	Fadi Dornaika; Franck Davoine
	Title	On appearance based face and facial action tracking			Type	Journal
	Year	2006	Publication	IEEE Transactions on Circuits and Systems for Video Technology, 16(9): 1838–1853	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes				Approved	no
	Call Number	Admin @ si @ DoD2006b			Serial	732
Permanent link to this record



	Author	Reza Azad; Maryam Asadi-Aghbolaghi; Shohreh Kasaei; Sergio Escalera
	Title	Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps			Type	Journal Article
	Year	2019	Publication	IEEE Transactions on Circuits and Systems for Video Technology	Abbreviated Journal	TCSVT
	Volume	29	Issue	6	Pages	1729-1740
	Keywords	Hand gesture recognition; Multilevel temporal sampling; Weighted depth motion map; Spatio-temporal description; VLAD encoding
	Abstract	Hand gesture recognition from sequences of depth maps is a challenging computer vision task because of the low inter-class and high intra-class variability, different execution rates of each gesture, and the high articulated nature of human hand. In this paper, a multilevel temporal sampling (MTS) method is first proposed that is based on the motion energy of key-frames of depth sequences. As a result, long, middle, and short sequences are generated that contain the relevant gesture information. The MTS results in increasing the intra-class similarity while raising the inter-class dissimilarities. The weighted depth motion map (WDMM) is then proposed to extract the spatio-temporal information from generated summarized sequences by an accumulated weighted absolute difference of consecutive frames. The histogram of gradient (HOG) and local binary pattern (LBP) are exploited to extract features from WDMM. The obtained results define the current state-of-the-art on three public benchmark datasets of: MSR Gesture 3D, SKIG, and MSR Action 3D, for 3D hand gesture recognition. We also achieve competitive results on NTU action dataset.
	Address	June 2019,
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ AAK2018			Serial	3213
Permanent link to this record



	Author	Chengyi Zou; Shuai Wan; Tiannan Ji; Marc Gorriz Blanch; Marta Mrak; Luis Herranz
	Title	Chroma Intra Prediction with Lightweight Attention-Based Neural Networks			Type	Journal Article
	Year	2023	Publication	IEEE Transactions on Circuits and Systems for Video Technology	Abbreviated Journal	TCSVT
	Volume	34	Issue	1	Pages	549 - 560
	Keywords
	Abstract	Neural networks can be successfully used for cross-component prediction in video coding. In particular, attention-based architectures are suitable for chroma intra prediction using luma information because of their capability to model relations between difierent channels. However, the complexity of such methods is still very high and should be further reduced, especially for decoding. In this paper, a cost-effective attention-based neural network is designed for chroma intra prediction. Moreover, with the goal of further improving coding performance, a novel approach is introduced to utilize more boundary information effectively. In addition to improving prediction, a simplification methodology is also proposed to reduce inference complexity by simplifying convolutions. The proposed schemes are integrated into H.266/Versatile Video Coding (VVC) pipeline, and only one additional binary block-level syntax flag is introduced to indicate whether a given block makes use of the proposed method. Experimental results demonstrate that the proposed scheme achieves up to −0.46%/−2.29%/−2.17% BD-rate reduction on Y/Cb/Cr components, respectively, compared with H.266/VVC anchor. Reductions in the encoding and decoding complexity of up to 22% and 61%, respectively, are achieved by the proposed scheme with respect to the previous attention-based chroma intra prediction method while maintaining coding performance.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MACO; LAMP			Approved	no
	Call Number	Admin @ si @ ZWJ2023			Serial	3875
Permanent link to this record



	Author	Mingyi Yang; Fei Yang; Luka Murn; Marc Gorriz Blanch; Juil Sock; Shuai Wan; Fuzheng Yang; Luis Herranz
	Title	Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks			Type	Journal Article
	Year	2024	Publication	IEEE Transactions on Circuits and Systems for Video Technology	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	M Yang, F Yang, L Murn, MG Blanch, J Sock, S Wan, F Yang, L Herranz
	Abstract	Visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine tasks, few consider real-world scenarios with multiple tasks. In this paper, we aim to address this gap by proposing a task-switchable pre-processor that optimizes input images specifically for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. The proposed task-switchable pre-processor adeptly maintains relevant semantic information based on the specific characteristics of different downstream tasks, while effectively suppressing irrelevant information to reduce bitrate. To enhance the processing of semantic information for diverse tasks, we leverage pre-extracted semantic features to modulate the pixel-to-pixel mapping within the pre-processor. By switching between different modulations, multiple tasks can be seamlessly incorporated into the system. Extensive experiments demonstrate the practicality and simplicity of our approach. It significantly reduces the number of parameters required for handling multiple tasks while still delivering impressive performance. Our method showcases the potential to achieve efficient and effective compression for machine vision tasks, supporting the evolving demands of real-world applications.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	xxx			Approved	no
	Call Number	Admin @ si @ YYM2024			Serial	4007
Permanent link to this record



	Author	Shifeng Zhang; Ajian Liu; Jun Wan; Yanyan Liang; Guogong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z. Li
	Title	CASIA-SURF: A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing			Type	Journal
	Year	2020	Publication	IEEE Transactions on Biometrics, Behavior, and Identity Science	Abbreviated Journal	TTBIS
	Volume	2	Issue	2	Pages	182 - 193
	Keywords
	Abstract	Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of 1,000 subjects with 21,000 videos and each sample has 3 modalities ( i.e. , RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq.com/face-anti-spoofing/welcome/challengecvpr2019?authuser=0
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ ZLW2020			Serial	3412
Permanent link to this record



	Author	Jose Seabra; Francesco Ciompi; Oriol Pujol; J. Mauri; Petia Radeva; Joao Sanchez
	Title	Rayleigh Mixture Model for Plaque Characterization in Intravascular Ultrasound			Type	Journal Article
	Year	2011	Publication	IEEE Transactions on Biomedical Engineering	Abbreviated Journal	TBME
	Volume	58	Issue	5	Pages	1314-1324
	Keywords
	Abstract	Vulnerable plaques are the major cause of carotid and coronary vascular problems, such as heart attack or stroke. A correct modeling of plaque echomorphology and composition can help the identification of such lesions. The Rayleigh distribution is widely used to describe (nearly) homogeneous areas in ultrasound images. Since plaques may contain tissues with heterogeneous regions, more complex distributions depending on multiple parameters are usually needed, such as Rice, K or Nakagami distributions. In such cases, the problem formulation becomes more complex, and the optimization procedure to estimate the plaque echomorphology is more difficult. Here, we propose to model the tissue echomorphology by means of a mixture of Rayleigh distributions, known as the Rayleigh mixture model (RMM). The problem formulation is still simple, but its ability to describe complex textural patterns is very powerful. In this paper, we present a method for the automatic estimation of the RMM mixture parameters by means of the expectation maximization algorithm, which aims at characterizing tissue echomorphology in ultrasound (US). The performance of the proposed model is evaluated with a database of in vitro intravascular US cases. We show that the mixture coefficients and Rayleigh parameters explicitly derived from the mixture model are able to accurately describe different plaque types and to significantly improve the characterization performance of an already existing methodology.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB;HuPBA			Approved	no
	Call Number	Admin @ si @ SCP2011			Serial	1712
Permanent link to this record



	Author	Marina Alberti; Simone Balocco; Carlo Gatta; Francesco Ciompi; Oriol Pujol; Joana Silva; Xavier Carrillo; Petia Radeva
	Title	Automatic Bifurcation Detection in Coronary IVUS Sequences			Type	Journal Article
	Year	2012	Publication	IEEE Transactions on Biomedical Engineering	Abbreviated Journal	TBME
	Volume	59	Issue	4	Pages	1022-2031
	Keywords
	Abstract	In this paper, we present a fully automatic method which identifies every bifurcation in an intravascular ultrasound (IVUS) sequence, the corresponding frames, the angular orientation with respect to the IVUS acquisition, and the extension. This goal is reached using a two-level classification scheme: first, a classifier is applied to a set of textural features extracted from each image of a sequence. A comparison among three state-of-the-art discriminative classifiers (AdaBoost, random forest, and support vector machine) is performed to identify the most suitable method for the branching detection task. Second, the results are improved by exploiting contextual information using a multiscale stacked sequential learning scheme. The results are then successively refined using a-priori information about branching dimensions and geometry. The proposed approach provides a robust tool for the quick review of pullback sequences, facilitating the evaluation of the lesion at bifurcation sites. The proposed method reaches an F-Measure score of 86.35%, while the F-Measure scores for inter- and intraobserver variability are 71.63% and 76.18%, respectively. The obtained results are positive. Especially, considering the branching detection task is very challenging, due to high variability in bifurcation dimensions and appearance.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	0018-9294	ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB;HuPBA			Approved	no
	Call Number	Admin @ si @ ABG2012			Serial	1996
Permanent link to this record



	Author	Maria Elena Meza-de-Luna; Juan Ramon Terven Salinas; Bogdan Raducanu; Joaquin Salas
	Title	Assessing the Influence of Mirroring on the Perception of Professional Competence using Wearable Technology			Type	Journal Article
	Year	2016	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
	Volume	9	Issue	2	Pages	161-175
	Keywords	Mirroring; Nodding; Competence; Perception; Wearable Technology
	Abstract	Nonverbal communication is an intrinsic part in daily face-to-face meetings. A frequently observed behavior during social interactions is mirroring, in which one person tends to mimic the attitude of the counterpart. This paper shows that a computer vision system could be used to predict the perception of competence in dyadic interactions through the automatic detection of mirroring events. To prove our hypothesis, we developed: (1) A social assistant for mirroring detection, using a wearable device which includes a video camera and (2) an automatic classifier for the perception of competence, using the number of nodding gestures and mirroring events as predictors. For our study, we used a mixed-method approach in an experimental design where 48 participants acting as customers interacted with a confederated psychologist. We found that the number of nods or mirroring events has a significant influence on the perception of competence. Our results suggest that: (1) Customer mirroring is a better predictor than psychologist mirroring; (2) the number of psychologist’s nods is a better predictor than the number of customer’s nods; (3) except for the psychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images from wearable smartglasses or fixed cameras.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.072;			Approved	no
	Call Number	Admin @ si @ MTR2016			Serial	2826
Permanent link to this record



	Author	Fatemeh Noroozi; Marina Marjanovic; Angelina Njegus; Sergio Escalera; Gholamreza Anbarjafari
	Title	Audio-Visual Emotion Recognition in Video Clips			Type	Journal Article
	Year	2019	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
	Volume	10	Issue	1	Pages	60-75
	Keywords
	Abstract	This paper presents a multimodal emotion recognition system, which is based on the analysis of audio and visual cues. From the audio channel, Mel-Frequency Cepstral Coefficients, Filter Bank Energies and prosodic features are extracted. For the visual part, two strategies are considered. First, facial landmarks’ geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames, which are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to key-frames summarizing videos. Finally, confidence outputs of all the classifiers from all the modalities are used to define a new feature space to be learned for final emotion label prediction, in a late fusion/stacking fashion. The experiments conducted on the SAVEE, eNTERFACE’05, and RML databases show significant performance improvements by our proposed system in comparison to current alternatives, defining the current state-of-the-art in all three databases.
	Address	1 Jan.-March 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; 602.143; 602.133			Approved	no
	Call Number	Admin @ si @ NMN2017			Serial	3011
Permanent link to this record



	Author	Yagmur Gucluturk; Umut Guclu; Xavier Baro; Hugo Jair Escalante; Isabelle Guyon; Sergio Escalera; Marcel A. J. van Gerven; Rob van Lier
	Title	Multimodal First Impression Analysis with Deep Residual Networks			Type	Journal Article
	Year	2018	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
	Volume	8	Issue	3	Pages	316-329
	Keywords
	Abstract	People form first impressions about the personalities of unfamiliar individuals even after very brief interactions with them. In this study we present and evaluate several models that mimic this automatic social behavior. Specifically, we present several models trained on a large dataset of short YouTube video blog posts for predicting apparent Big Five personality traits of people and whether they seem suitable to be recommended to a job interview. Along with presenting our audiovisual approach and results that won the third place in the ChaLearn First Impressions Challenge, we investigate modeling in different modalities including audio only, visual only, language only, audiovisual, and combination of audiovisual and language. Our results demonstrate that the best performance could be obtained using a fusion of all data modalities. Finally, in order to promote explainability in machine learning and to provide an example for the upcoming ChaLearn challenges, we present a simple approach for explaining the predictions for job interview recommendations
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ GGB2018			Serial	3210
Permanent link to this record



	Author	Ricardo Dario Perez Principi; Cristina Palmero; Julio C. S. Jacques Junior; Sergio Escalera
	Title	On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals			Type	Journal Article
	Year	2021	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
	Volume	12	Issue	3	Pages	607-621
	Keywords
	Abstract	Personality perception is implicitly biased due to many subjective factors, such as cultural, social, contextual, gender and appearance. Approaches developed for automatic personality perception are not expected to predict the real personality of the target, but the personality external observers attributed to it. Hence, they have to deal with human bias, inherently transferred to the training data. However, bias analysis in personality computing is an almost unexplored area. In this work, we study different possible sources of bias affecting personality perception, including emotions from facial expressions, attractiveness, age, gender, and ethnicity, as well as their influence on prediction ability for apparent personality estimation. To this end, we propose a multi-modal deep neural network that combines raw audio and visual information alongside predictions of attribute-specific models to regress apparent personality. We also analyse spatio-temporal aggregation schemes and the effect of different time intervals on first impressions. We base our study on the ChaLearn First Impressions dataset, consisting of one-person conversational videos. Our model shows state-of-the-art results regressing apparent personality based on the Big-Five model. Furthermore, given the interpretability nature of our network design, we provide an incremental analysis on the impact of each possible source of bias on final network predictions.
	Address	1 July-Sept. 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ PPJ2019			Serial	3312
Permanent link to this record



	Author	Hugo Jair Escalante; Heysem Kaya; Albert Ali Salah; Sergio Escalera; Yagmur Gucluturk; Umut Guçlu; Xavier Baro; Isabelle Guyon; Julio C. S. Jacques Junior; Meysam Madadi; Stephane Ayache; Evelyne Viegas; Furkan Gurpinar; Achmadnoer Sukma Wicaksana; Cynthia Liem; Marcel A. J. Van Gerven; Rob Van Lier
	Title	Modeling, Recognizing, and Explaining Apparent Personality from Videos			Type	Journal Article
	Year	2022	Publication	IEEE Transactions on Affective Computing	Abbreviated Journal	TAC
	Volume	13	Issue	2	Pages	894-911
	Keywords
	Abstract	Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.
	Address	1 April-June 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no menciona			Approved	no
	Call Number	Admin @ si @ EKS2022			Serial	3406
Permanent link to this record