|
Clementine Decamps, Alexis Arnaud, Florent Petitprez, Mira Ayadi, Aurelia Baures, Lucile Armenoult, et al. (2021). DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification. BMC Bioinformatics, 22, 473.
Abstract: Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data.
|
|
|
Victor M. Campello, Polyxeni Gkontra, Cristian Izquierdo, Carlos Martin-Isla, Alireza Sojoudi, Peter M. Full, et al. (2021). Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge. TMI - IEEE Transactions on Medical Imaging, 40(12), 3543–3554.
Abstract: The emergence of deep learning has considerably advanced the state-of-the-art in cardiac magnetic resonance (CMR) segmentation. Many techniques have been proposed over the last few years, bringing the accuracy of automated segmentation close to human performance. However, these models have been all too often trained and validated using cardiac imaging samples from single clinical centres or homogeneous imaging protocols. This has prevented the development and validation of models that are generalizable across different clinical centres, imaging conditions or scanner vendors. To promote further research and scientific benchmarking in the field of generalizable deep learning for cardiac segmentation, this paper presents the results of the Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation (M&Ms) Challenge, which was recently organized as part of the MICCAI 2020 Conference. A total of 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques. The obtained results indicate the importance of intensity-driven data augmentation, as well as the need for further research to improve generalizability towards unseen scanner vendors or new imaging protocols. Furthermore, we present a new resource of 375 heterogeneous CMR datasets acquired by using four different scanner vendors in six hospitals and three different countries (Spain, Canada and Germany), which we provide as open-access for the community to enable future research in the field.
|
|
|
Meysam Madadi, Hugo Bertiche, & Sergio Escalera. (2021). Deep unsupervised 3D human body reconstruction from a sparse set of landmarks. IJCV - International Journal of Computer Vision, 129, 2499–2512.
Abstract: In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data.
|
|
|
Swathikiran Sudhakaran, Sergio Escalera, & Oswald Lanz. (2021). Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, .
Abstract: We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.
|
|
|
Fatemeh Noroozi, Ciprian Corneanu, Dorota Kamińska, Tomasz Sapiński, Sergio Escalera, & Gholamreza Anbarjafari. (2021). Survey on Emotional Body Gesture Recognition. TAC - IEEE Transactions on Affective Computing, 12(2), 505–523.
Abstract: Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as “body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce, there is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations.
|
|