Neelu Madan, Arya Farkhondeh, Kamal Nasrollahi, Sergio Escalera, & Thomas B. Moeslund. (2021). Temporal Cues From Socially Unacceptable Trajectories for Anomaly Detection. In IEEE/CVF International Conference on Computer Vision Workshops (pp. 2150–2158).
Abstract: State-of-the-Art (SoTA) deep learning-based approaches to detect anomalies in surveillance videos utilize limited temporal information, including basic information from motion, e.g., optical flow computed between consecutive frames. In this paper, we compliment the SoTA methods by including long-range dependencies from trajectories for anomaly detection. To achieve that, we first created trajectories by running a tracker on two SoTA datasets, namely Avenue and Shanghai-Tech. We propose a prediction-based anomaly detection method using trajectories based on Social GANs, also called in this paper as temporal-based anomaly detection. Then, we hypothesize that late fusion of the result of this temporal-based anomaly detection system with spatial-based anomaly detection systems produces SoTA results. We verify this hypothesis on two spatial-based anomaly detection systems. We show that both cases produce results better than baseline spatial-based systems, indicating the usefulness of the temporal information coming from the trajectories for anomaly detection. We observe that the proposed approach depicts the maximum improvement in micro-level Area-Under-the-Curve (AUC) by 4.1% on CUHK Avenue and 3.4% on Shanghai-Tech over one of the baseline method. We also show a high performance on cross-data evaluation, where we learn the weights to combine spatial and temporal information on Shanghai-Tech and perform evaluation on CUHK Avenue and vice-versa.
|
Clementine Decamps, Alexis Arnaud, Florent Petitprez, Mira Ayadi, Aurelia Baures, Lucile Armenoult, et al. (2021). DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification. BMC Bioinformatics, 22, 473.
Abstract: Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data.
|
Javier Marin, & Sergio Escalera. (2021). SSSGAN: Satellite Style and Structure Generative Adversarial Networks. Remote Sensing, 13(19), 3984.
Abstract: This work presents Satellite Style and Structure Generative Adversarial Network (SSGAN), a generative model of high resolution satellite imagery to support image segmentation. Based on spatially adaptive denormalization modules (SPADE) that modulate the activations with respect to segmentation map structure, in addition to global descriptor vectors that capture the semantic information in a vector with respect to Open Street Maps (OSM) classes, this model is able to produce
consistent aerial imagery. By decoupling the generation of aerial images into a structure map and a carefully defined style vector, we were able to improve the realism and geodiversity of the synthesis with respect to the state-of-the-art baseline. Therefore, the proposed model allows us to control the generation not only with respect to the desired structure, but also with respect to a geographic area.
|
Victor M. Campello, Polyxeni Gkontra, Cristian Izquierdo, Carlos Martin-Isla, Alireza Sojoudi, Peter M. Full, et al. (2021). Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge. TMI - IEEE Transactions on Medical Imaging, 40(12), 3543–3554.
Abstract: The emergence of deep learning has considerably advanced the state-of-the-art in cardiac magnetic resonance (CMR) segmentation. Many techniques have been proposed over the last few years, bringing the accuracy of automated segmentation close to human performance. However, these models have been all too often trained and validated using cardiac imaging samples from single clinical centres or homogeneous imaging protocols. This has prevented the development and validation of models that are generalizable across different clinical centres, imaging conditions or scanner vendors. To promote further research and scientific benchmarking in the field of generalizable deep learning for cardiac segmentation, this paper presents the results of the Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation (M&Ms) Challenge, which was recently organized as part of the MICCAI 2020 Conference. A total of 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques. The obtained results indicate the importance of intensity-driven data augmentation, as well as the need for further research to improve generalizability towards unseen scanner vendors or new imaging protocols. Furthermore, we present a new resource of 375 heterogeneous CMR datasets acquired by using four different scanner vendors in six hospitals and three different countries (Spain, Canada and Germany), which we provide as open-access for the community to enable future research in the field.
|
Meysam Madadi, Hugo Bertiche, & Sergio Escalera. (2021). Deep unsupervised 3D human body reconstruction from a sparse set of landmarks. IJCV - International Journal of Computer Vision, 129, 2499–2512.
Abstract: In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data.
|
Meysam Madadi, Hugo Bertiche, Wafa Bouzouita, Isabelle Guyon, & Sergio Escalera. (2021). Learning Cloth Dynamics: 3D+Texture Garment Reconstruction Benchmark. In Proceedings of Machine Learning Research (Vol. 133, pp. 57–76).
Abstract: Human avatars are important targets in many computer applications. Accurately tracking, capturing, reconstructing and animating the human body, face and garments in 3D are critical for human-computer interaction, gaming, special effects and virtual reality. In the past, this has required extensive manual animation. Regardless of the advances in human body and face reconstruction, still modeling, learning and analyzing human dynamics need further attention. In this paper we plan to push the research in this direction, e.g. understanding human dynamics in 2D and 3D, with special attention to garments. We provide a large-scale dataset (more than 2M frames) of animated garments with variable topology and type, calledCLOTH3D++. The dataset contains RGBA video sequences paired with its corresponding 3D data. We pay special care to garment dynamics and realistic rendering of RGB data, including lighting, fabric type and texture. With this dataset, we hold a competition at NeurIPS2020. We design three tracks so participants can compete to develop the best method to perform 3D garment reconstruction in a sequence from (1) 3D-to-3D garments, (2) RGB-to-3D garments, and (3) RGB-to-3D garments plus texture. We also provide a baseline method, based on graph convolutional networks, for each track. Baseline results show that there is a lot of room for improvements. However, due to the challenging nature of the problem, no participant could outperform the baselines.
|
Swathikiran Sudhakaran, Sergio Escalera, & Oswald Lanz. (2021). Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, .
Abstract: We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.
|
Fatemeh Noroozi, Ciprian Corneanu, Dorota Kamińska, Tomasz Sapiński, Sergio Escalera, & Gholamreza Anbarjafari. (2021). Survey on Emotional Body Gesture Recognition. TAC - IEEE Transactions on Affective Computing, 12(2), 505–523.
Abstract: Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations.
|
Kaustubh Kulkarni, Ciprian Corneanu, Ikechukwu Ofodile, Sergio Escalera, Xavier Baro, Sylwia Hyniewska, et al. (2021). Automatic Recognition of Facial Displays of Unfelt Emotions. TAC - IEEE Transactions on Affective Computing, 12(2), 377–390.
Abstract: Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average, it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase.
|
Joan Codina-Filba, Sergio Escalera, Joan Escudero, Coen Antens, Pau Buch-Cardona, & Mireia Farrus. (2021). Mobile eHealth Platform for Home Monitoring of Bipolar Disorder. In 27th ACM International Conference on Multimedia Modeling (Vol. 12573, pp. 330–341). LNCS.
Abstract: People suffering Bipolar Disorder (BD) experiment changes in mood status having depressive or manic episodes with normal periods in the middle. BD is a chronic disease with a high level of non-adherence to medication that needs a continuous monitoring of patients to detect when they relapse in an episode, so that physicians can take care of them. Here we present MoodRecord, an easy-to-use, non-intrusive, multilingual, robust and scalable platform suitable for home monitoring patients with BD, that allows physicians and relatives to track the patient state and get alarms when abnormalities occur.
MoodRecord takes advantage of the capabilities of smartphones as a communication and recording device to do a continuous monitoring of patients. It automatically records user activity, and asks the user to answer some questions or to record himself in video, according to a predefined plan designed by physicians. The video is analysed, recognising the mood status from images and bipolar assessment scores are extracted from speech parameters. The data obtained from the different sources are merged periodically to observe if a relapse may start and if so, raise the corresponding alarm. The application got a positive evaluation in a pilot with users from three different countries. During the pilot, the predictions of the voice and image modules showed a coherent correlation with the diagnosis performed by clinicians.
|
Ajian Liu, Zichang Tan, Jun Wan, Sergio Escalera, Guodong Guo, & Stan Z. Li. (2021). CASIA-SURF CeFA: A Benchmark for Multi-modal Cross-Ethnicity Face Anti-Spoofing. In IEEE Winter Conference on Applications of Computer Vision (pp. 1178–1186).
Abstract: The issue of ethnic bias has proven to affect the performance of face recognition in previous works, while it still remains to be vacant in face anti-spoofing. Therefore, in order to study the ethnic bias for face anti-spoofing, we introduce the largest CASIA-SURF Cross-ethnicity Face Anti-spoofing (CeFA) dataset, covering 3 ethnicities, 3 modalities, 1,607 subjects, and 2D plus 3D attack types. Five protocols are introduced to measure the affect under varied evaluation conditions, such as cross-ethnicity, unknown spoofs or both of them. As our knowledge, CASIA-SURF CeFA is the first dataset including explicit ethnic labels in current released datasets. Then, we propose a novel multi-modal fusion method as a strong baseline to alleviate the ethnic bias, which employs a partially shared fusion strategy to learn complementary information from multiple modalities. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability for other existing datasets, i.e., CASIA-SURF, OULU-NPU and SiW datasets. The dataset is available at https://sites.google.com/qq.com/face-anti-spoofing/welcome/challengecvpr2020?authuser=0.
|
Jose Elias Yauri, Aura Hernandez-Sabate, Pau Folch, & Debora Gil. (2021). Mental Workload Detection Based on EEG Analysis. In Artificial Intelligent Research and Development. Proceedings 23rd International Conference of the Catalan Association for Artificial Intelligence. (Vol. 339, pp. 268–277).
Abstract: The study of mental workload becomes essential for human work efficiency, health conditions and to avoid accidents, since workload compromises both performance and awareness. Although workload has been widely studied using several physiological measures, minimising the sensor network as much as possible remains both a challenge and a requirement.
Electroencephalogram (EEG) signals have shown a high correlation to specific cognitive and mental states like workload. However, there is not enough evidence in the literature to validate how well models generalize in case of new subjects performing tasks of a workload similar to the ones included during model’s training.
In this paper we propose a binary neural network to classify EEG features across different mental workloads. Two workloads, low and medium, are induced using two variants of the N-Back Test. The proposed model was validated in a dataset collected from 16 subjects and shown a high level of generalization capability: model reported an average recall of 81.81% in a leave-one-out subject evaluation.
Keywords: Cognitive states; Mental workload; EEG analysis; Neural Networks.
|
Sonia Baeza, R.Domingo, M.Salcedo, G.Moragas, J.Deportos, I.Garcia Olive, et al. (2021). Artificial Intelligence to Optimize Pulmonary Embolism Diagnosis During Covid-19 Pandemic by Perfusion SPECT/CT, a Pilot Study. American Journal of Respiratory and Critical Care Medicine, .
|
Mireia Sole, Joan Blanco, Debora Gil, Oliver Valero, Alvaro Pascual, B. Cardenas, et al. (2021). Chromosomal positioning in spermatogenic cells is influenced by chromosomal factors associated with gene activity, bouquet formation, and meiotic sex-chromosome inactivation. Chromosoma, 130, 163–175.
Abstract: Chromosome territoriality is not random along the cell cycle and it is mainly governed by intrinsic chromosome factors and gene expression patterns. Conversely, very few studies have explored the factors that determine chromosome territoriality and its influencing factors during meiosis. In this study, we analysed chromosome positioning in murine spermatogenic cells using three-dimensionally fluorescence in situ hybridization-based methodology, which allows the analysis of the entire karyotype. The main objective of the study was to decipher chromosome positioning in a radial axis (all analysed germ-cell nuclei) and longitudinal axis (only spermatozoa) and to identify the chromosomal factors that regulate such an arrangement. Results demonstrated that the radial positioning of chromosomes during spermatogenesis was cell-type specific and influenced by chromosomal factors associated to gene activity. Chromosomes with specific features that enhance transcription (high GC content, high gene density and high numbers of predicted expressed genes) were preferentially observed in the inner part of the nucleus in virtually all cell types. Moreover, the position of the sex chromosomes was influenced by their transcriptional status, from the periphery of the nucleus when its activity was repressed (pachytene) to a more internal position when it is partially activated (spermatid). At pachytene, chromosome positioning was also influenced by chromosome size due to the bouquet formation. Longitudinal chromosome positioning in the sperm nucleus was not random either, suggesting the importance of ordered longitudinal positioning for the release and activation of the paternal genome after fertilisation.
|
Marta Ligero, Alonso Garcia Ruiz, Cristina Viaplana, Guillermo Villacampa, Maria V Raciti, Jaid Landa, et al. (2021). A CT-based radiomics signature is associated with response to immune checkpoint inhibitors in advanced solid tumors. Radiology, 299(1), 109–119.
Abstract: Background Reliable predictive imaging markers of response to immune checkpoint inhibitors are needed. Purpose To develop and validate a pretreatment CT-based radiomics signature to predict response to immune checkpoint inhibitors in advanced solid tumors. Materials and Methods In this retrospective study, a radiomics signature was developed in patients with advanced solid tumors (including breast, cervix, gastrointestinal) treated with anti-programmed cell death-1 or programmed cell death ligand-1 monotherapy from August 2012 to May 2018 (cohort 1). This was tested in patients with bladder and lung cancer (cohorts 2 and 3). Radiomics variables were extracted from all metastases delineated at pretreatment CT and selected by using an elastic-net model. A regression model combined radiomics and clinical variables with response as the end point. Biologic validation of the radiomics score with RNA profiling of cytotoxic cells (cohort 4) was assessed with Mann-Whitney analysis. Results The radiomics signature was developed in 85 patients (cohort 1: mean age, 58 years ± 13 [standard deviation]; 43 men) and tested on 46 patients (cohort 2: mean age, 70 years ± 12; 37 men) and 47 patients (cohort 3: mean age, 64 years ± 11; 40 men). Biologic validation was performed in a further cohort of 20 patients (cohort 4: mean age, 60 years ± 13; 14 men). The radiomics signature was associated with clinical response to immune checkpoint inhibitors (area under the curve [AUC], 0.70; 95% CI: 0.64, 0.77; P < .001). In cohorts 2 and 3, the AUC was 0.67 (95% CI: 0.58, 0.76) and 0.67 (95% CI: 0.56, 0.77; P < .001), respectively. A radiomics-clinical signature (including baseline albumin level and lymphocyte count) improved on radiomics-only performance (AUC, 0.74 [95% CI: 0.63, 0.84; P < .001]; Akaike information criterion, 107.00 and 109.90, respectively). Conclusion A pretreatment CT-based radiomics signature is associated with response to immune checkpoint inhibitors, likely reflecting the tumor immunophenotype. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Summers in this issue.
|