Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z. Li. (2020). Multi-modal Face Presentation Attach Detection (Vol. 13).
|
Tomas Sixta, Julio C. S. Jacques Junior, Pau Buch Cardona, Eduard Vazquez, & Sergio Escalera. (2020). FairFace Challenge at ECCV 2020: Analyzing Bias in Face Recognition. In ECCV Workshops (Vol. 12540, pp. 463–481). LNCS.
Abstract: This work summarizes the 2020 ChaLearn Looking at People Fair Face Recognition and Analysis Challenge and provides a description of the top-winning solutions and analysis of the results. The aim of the challenge was to evaluate accuracy and bias in gender and skin colour of submitted algorithms on the task of 1:1 face verification in the presence of other confounding attributes. Participants were evaluated using an in-the-wild dataset based on reannotated IJB-C, further enriched 12.5K new images and additional labels. The dataset is not balanced, which simulates a real world scenario where AI-based models supposed to present fair outcomes are trained and evaluated on imbalanced data. The challenge attracted 151 participants, who made more 1.8K submissions in total. The final phase of the challenge attracted 36 active teams out of which 10 exceeded 0.999 AUC-ROC while achieving very low scores in the proposed bias metrics. Common strategies by the participants were face pre-processing, homogenization of data distributions, the use of bias aware loss functions and ensemble models. The analysis of top-10 teams shows higher false positive rates (and lower false negative rates) for females with dark skin tone as well as the potential of eyeglasses and young age to increase the false positive rates too.
|
Zhengying Liu, Zhen Xu, Shangeth Rajaa, Meysam Madadi, Julio C. S. Jacques Junior, Sergio Escalera, et al. (2020). Towards Automated Deep Learning: Analysis of the AutoDL challenge series 2019. In Proceedings of Machine Learning Research (Vol. 123, pp. 242–252).
Abstract: We present the design and results of recent competitions in Automated Deep Learning (AutoDL). In the AutoDL challenge series 2019, we organized 5 machine learning challenges: AutoCV, AutoCV2, AutoNLP, AutoSpeech and AutoDL. The first 4 challenges concern each a specific application domain, such as computer vision, natural language processing and speech recognition. At the time of March 2020, the last challenge AutoDL is still on-going and we only present its design. Some highlights of this work include: (1) a benchmark suite of baseline AutoML solutions, with emphasis on domains for which Deep Learning methods have had prior success (image, video, text, speech, etc); (2) a novel any-time learning framework, which opens doors for further theoretical consideration; (3) a repository of around 100 datasets (from all above domains) over half of which are released as public datasets to enable research on meta-learning; (4) analyses revealing that winning solutions generalize to new unseen datasets, validating progress towards universal AutoML solution; (5) open-sourcing of the challenge platform, the starting kit, the dataset formatting toolkit, and all winning solutions (All information available at {autodl.chalearn.org}).
|
Albert Clapes, Julio C. S. Jacques Junior, Carla Morral, & Sergio Escalera. (2020). ChaLearn LAP 2020 Challenge on Identity-preserved Human Detection: Dataset and Results. In 15th IEEE International Conference on Automatic Face and Gesture Recognition (pp. 801–808).
Abstract: This paper summarizes the ChaLearn Looking at People 2020 Challenge on Identity-preserved Human Detection (IPHD). For the purpose, we released a large novel dataset containing more than 112K pairs of spatiotemporally aligned depth and thermal frames (and 175K instances of humans) sampled from 780 sequences. The sequences contain hundreds of non-identifiable people appearing in a mix of in-the-wild and scripted scenarios recorded in public and private places. The competition was divided into three tracks depending on the modalities exploited for the detection: (1) depth, (2) thermal, and (3) depth-thermal fusion. Color was also captured but only used to facilitate the groundtruth annotation. Still the temporal synchronization of three sensory devices is challenging, so bad temporal matches across modalities can occur. Hence, the labels provided should considered “weak”, although test frames were carefully selected to minimize this effect and ensure the fairest comparison of the participants’ results. Despite this added difficulty, the results got by the participants demonstrate current fully-supervised methods can deal with that and achieve outstanding detection performance when measured in terms of AP@0.50.
|
Zhengying Liu, Adrien Pavao, Zhen Xu, Sergio Escalera, Isabelle Guyon, Julio C. S. Jacques Junior, et al. (2020). How far are we from true AutoML: reflection from winning solutions and results of AutoDL challenge. In 7th ICML Workshop on Automated Machine Learning.
Abstract: Following the completion of the AutoDL challenge (the final challenge in the ChaLearn
AutoDL challenge series 2019), we investigate winning solutions and challenge results to
answer an important motivational question: how far are we from achieving true AutoML?
On one hand, the winning solutions achieve good (accurate and fast) classification performance on unseen datasets. On the other hand, all winning solutions still contain a
considerable amount of hard-coded knowledge on the domain (or modality) such as image,
video, text, speech and tabular. This form of ad-hoc meta-learning could be replaced by
more automated forms of meta-learning in the future. Organizing a meta-learning challenge could help forging AutoML solutions that generalize to new unseen domains (e.g.
new types of sensor data) as well as gaining insights on the AutoML problem from a more
fundamental point of view. The datasets of the AutoDL challenge are a resource that can
be used for further benchmarks and the code of the winners has been outsourced, which is
a big step towards “democratizing” Deep Learning.
|
Anna Esposito, Terry Amorese, Nelson Maldonato, Alessandro Vinciarelli, Maria Ines Torres, Sergio Escalera, et al. (2020). Seniors’ ability to decode differently aged facial emotional expressions. In Faces and Gestures in E-health and welfare workshop (pp. 716–722).
|
Anna Esposito, Italia Cirillo, Antonietta Esposito, Leopoldina Fortunati, Gian Luca Foresti, Sergio Escalera, et al. (2020). Impairments in decoding facial and vocal emotional expressions in high functioning autistic adults and adolescents. In Faces and Gestures in E-health and welfare workshop (pp. 667–674).
|
Josep Famadas, Meysam Madadi, Cristina Palmero, & Sergio Escalera. (2020). Generative Video Face Reenactment by AUs and Gaze Regularization. In 15th IEEE International Conference on Automatic Face and Gesture Recognition (pp. 444–451).
Abstract: In this work, we propose an encoder-decoder-like architecture to perform face reenactment in image sequences. Our goal is to transfer the training subject identity to a given test subject. We regularize face reenactment by facial action unit intensity and 3D gaze vector regression. This way, we enforce the network to transfer subtle facial expressions and eye dynamics, providing a more lifelike result. The proposed encoder-decoder receives as input the previous sequence frame stacked to the current frame image of facial landmarks. Thus, the generated frames benefit from appearance and geometry, while keeping temporal coherence for the generated sequence. At test stage, a new target subject with the facial performance of the source subject and the appearance of the training subject is reenacted. Principal component analysis is applied to project the test subject geometry to the closest training subject geometry before reenactment. Evaluation of our proposal shows faster convergence, and more accurate and realistic results in comparison to other architectures without action units and gaze regularization.
|
Carlos Martin-Isla, Maryam Asadi-Aghbolaghi, Polyxeni Gkontra, Victor M. Campello, Sergio Escalera, & Karim Lekadir. (2020). Stacked BCDU-net with semantic CMR synthesis: application to Myocardial Pathology Segmentation challenge. In MYOPS challenge and workshop.
|
Hugo Bertiche, Meysam Madadi, & Sergio Escalera. (2020). CLOTH3D: Clothed 3D Humans. In 16th European Conference on Computer Vision.
Abstract: This work presents CLOTH3D, the first big scale synthetic dataset of 3D clothed human sequences. CLOTH3D contains a large variability on garment type, topology, shape, size, tightness and fabric. Clothes are simulated on top of thousands of different pose sequences and body shapes, generating realistic cloth dynamics. We provide the dataset with a generative model for cloth generation. We propose a Conditional Variational Auto-Encoder (CVAE) based on graph convolutions (GCVAE) to learn garment latent spaces. This allows for realistic generation of 3D garments on top of SMPL model for any pose and shape.
|
Reza Azad, Maryam Asadi-Aghbolaghi, Mahmood Fathy, & Sergio Escalera. (2020). Attention Deeplabv3+: Multi-level Context Attention Mechanism for Skin Lesion Segmentation. In Bioimage computation workshop.
|
Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, Albert Clapes, Alexa Mosegui, et al. (2021). Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset. In IEEE Winter Conference on Applications of Computer Vision (pp. 1–12).
Abstract: This paper introduces UDIVA, a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload. The dataset consists of 90.5 hours of dyadic interactions among 147 participants distributed in 188 sessions, recorded using multiple audiovisual and physiological sensors. Currently, it includes sociodemographic, self- and peer-reported personality, internal state, and relationship profiling from participants. As an initial analysis on UDIVA, we propose a
transformer-based method for self-reported personality inference in dyadic scenarios, which uses audiovisual data and different sources of context from both interlocutors to
regress a target person’s personality traits. Preliminary results from an incremental study show consistent improvements when using all available context information.
|
Julio C. S. Jacques Junior, Agata Lapedriza, Cristina Palmero, Xavier Baro, & Sergio Escalera. (2021). Person Perception Biases Exposed: Revisiting the First Impressions Dataset. In IEEE Winter Conference on Applications of Computer Vision (pp. 13–21).
Abstract: This work revisits the ChaLearn First Impressions database, annotated for personality perception using pairwise comparisons via crowdsourcing. We analyse for the first time the original pairwise annotations, and reveal existing person perception biases associated to perceived attributes like gender, ethnicity, age and face attractiveness.
We show how person perception bias can influence data labelling of a subjective task, which has received little attention from the computer vision and machine learning communities by now. We further show that the mechanism used to convert pairwise annotations to continuous values may magnify the biases if no special treatment is considered. The findings of this study are relevant for the computer vision community that is still creating new datasets on subjective tasks, and using them for practical applications, ignoring these perceptual biases.
|
Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nikola Rieke, Samuel Joutard, et al. (2023). CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation. MIA - Medical Image Analysis, 83, 102628.
Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice – VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice – VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.
Keywords: Domain Adaptation; Segmen tation; Vestibular Schwnannoma
|
Julio C. S. Jacques Junior, Yagmur Gucluturk, Marc Perez, Umut Guçlu, Carlos Andujar, Xavier Baro, et al. (2022). First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis. TAC - IEEE Transactions on Affective Computing, 13(1), 75–95.
Abstract: Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.
Keywords: Personality computing; first impressions; person perception; big-five; subjective bias; computer vision; machine learning; nonverbal signals; facial expression; gesture; speech analysis; multi-modal recognition
|