|
Juan Borrego-Carazo, Carles Sanchez, David Castells, Jordi Carrabina, & Debora Gil. (2022). A benchmark for the evaluation of computational methods for bronchoscopic navigation. IJCARS - International Journal of Computer Assisted Radiology and Surgery, 17(1).
|
|
|
Antoni Rosell, Sonia Baeza, S. Garcia-Reina, JL. Mate, Ignasi Guasch, I. Nogueira, et al. (2022). EP01.05-001 Radiomics to Increase the Effectiveness of Lung Cancer Screening Programs. Radiolung Preliminary Results. JTO - Journal of Thoracic Oncology, 17(9), S182.
|
|
|
Antoni Rosell, Sonia Baeza, S. Garcia-Reina, JL. Mate, Ignasi Guasch, I. Nogueira, et al. (2022). Radiomics to increase the effectiveness of lung cancer screening programs. Radiolung preliminary results. ERJ - European Respiratory Journal, 60(66).
|
|
|
Julio C. S. Jacques Junior, Yagmur Gucluturk, Marc Perez, Umut Guçlu, Carlos Andujar, Xavier Baro, et al. (2022). First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis. TAC - IEEE Transactions on Affective Computing, 13(1), 75–95.
Abstract: Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.
Keywords: Personality computing; first impressions; person perception; big-five; subjective bias; computer vision; machine learning; nonverbal signals; facial expression; gesture; speech analysis; multi-modal recognition
|
|
|
Penny Tarling, Mauricio Cantor, Albert Clapes, & Sergio Escalera. (2022). Deep learning with self-supervision and uncertainty regularization to count fish in underwater images. Plos - PloS One, 17(5), e0267759.
Abstract: Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.
|
|
|
Victor M. Campello, Carlos Martin-Isla, Cristian Izquierdo, Andrea Guala, Jose F. Rodriguez Palomares, David Vilades, et al. (2022). Minimising multi-centre radiomics variability through image normalisation: a pilot study. ScR - Scientific Reports, 12(1), 12532.
Abstract: Radiomics is an emerging technique for the quantification of imaging data that has recently shown great promise for deeper phenotyping of cardiovascular disease. Thus far, the technique has been mostly applied in single-centre studies. However, one of the main difficulties in multi-centre imaging studies is the inherent variability of image characteristics due to centre differences. In this paper, a comprehensive analysis of radiomics variability under several image- and feature-based normalisation techniques was conducted using a multi-centre cardiovascular magnetic resonance dataset. 218 subjects divided into healthy (n = 112) and hypertrophic cardiomyopathy (n = 106, HCM) groups from five different centres were considered. First and second order texture radiomic features were extracted from three regions of interest, namely the left and right ventricular cavities and the left ventricular myocardium. Two methods were used to assess features’ variability. First, feature distributions were compared across centres to obtain a distribution similarity index. Second, two classification tasks were proposed to assess: (1) the amount of centre-related information encoded in normalised features (centre identification) and (2) the generalisation ability for a classification model when trained on these features (healthy versus HCM classification). The results showed that the feature-based harmonisation technique ComBat is able to remove the variability introduced by centre information from radiomic features, at the expense of slightly degrading classification performance. Piecewise linear histogram matching normalisation gave features with greater generalisation ability for classification ( balanced accuracy in between 0.78 ± 0.08 and 0.79 ± 0.09). Models trained with features from images without normalisation showed the worst performance overall ( balanced accuracy in between 0.45 ± 0.28 and 0.60 ± 0.22). In conclusion, centre-related information removal did not imply good generalisation ability for classification.
|
|
|
Zhen Xu, Sergio Escalera, Adrien Pavao, Magali Richard, Wei-Wei Tu, Quanming Yao, et al. (2022). Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform. PATTERNS - Patterns, 3(7), 100543.
Abstract: Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.
Keywords: Machine learning; data science; benchmark platform; reproducibility; competitions
|
|
|
Ajian Liu, Chenxu Zhao, Zitong Yu, Jun Wan, Anyang Su, Xing Liu, et al. (2022). Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection. TIForensicSEC - IEEE Transactions on Information Forensics and Security, 17, 2497–2507.
Abstract: Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achieved acceptable performance on these benchmarks but still far from the needs of practical scenarios. To bridge the gap to real-world applications, we introduce a large-scale Hi gh- Fi delity Mask dataset, namely HiFiMask . Specifically, a total amount of 54,600 videos are recorded from 75 subjects with 225 realistic masks by 7 new kinds of sensors. Along with the dataset, we propose a novel C ontrastive C ontext-aware L earning (CCL) framework. CCL is a new training methodology for supervised PAD tasks, which is able to learn by leveraging rich contexts accurately (e.g., subjects, mask material and lighting) among pairs of live faces and high-fidelity mask attacks. Extensive experimental evaluations on HiFiMask and three additional 3D mask datasets demonstrate the effectiveness of our method. The codes and dataset will be released soon.
|
|
|
Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, & Thomas B. Moeslund. (2022). Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification. AC - Automation in Construction, 144, 104614.
Abstract: A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Keywords: Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
|
|
|
Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, & Thomas B. Moeslund. (2022). Multi-Task Classification of Sewer Pipe Defects and Properties Using a Cross-Task Graph Neural Network Decoder. In Winter Conference on Applications of Computer Vision (pp. 2806–2817).
Abstract: The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer
defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and pipe shape, which are needed to evaluate the level of sewer pipe deterioration.
In this work we classify sewer pipe defects and properties concurrently and present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN), which refines the disjointed per-task predictions using cross-task information. The CT-GNN architecture extends the traditional disjointed task-heads decoder, by utilizing a cross-task graph and unique class node embeddings. The cross-task graph can either be determined a priori based on the conditional probability between the task classes or determined dynamically using self-attention.
CT-GNN can be added to any backbone and trained end-toend at a small increase in the parameter count. We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset, improving defect classification and
water level classification by 5.3 and 8.0 percentage points, respectively. We also outperform the single task methods as well as other multi-task classification approaches while introducing 50 times fewer parameters than previous modelfocused approaches.
Keywords: Vision Systems; Applications Multi-Task Classification
|
|
|
Razieh Rastgoo, Kourosh Kiani, & Sergio Escalera. (2022). Real-time Isolated Hand Sign Language RecognitioN Using Deep Networks and SVD. Journal of Ambient Intelligence and Humanized Computing, 591–611.
Abstract: One of the challenges in computer vision models, especially sign language, is real-time recognition. In this work, we present a simple yet low-complex and efficient model, comprising single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and long short term memory, to real-time isolated hand sign language recognition (IHSLR) from RGB video. We employ the SVD method as an efficient, compact, and discriminative feature extractor from the estimated 3D hand keypoints coordinators. Despite the previous works that employ the estimated 3D hand keypoints coordinates as raw features, we propose a novel and revolutionary way to apply the SVD to the estimated 3D hand keypoints coordinates to get more discriminative features. SVD method is also applied to the geometric relations between the consecutive segments of each finger in each hand and also the angles between these sections. We perform a detailed analysis of recognition time and accuracy. One of our contributions is that this is the first time that the SVD method is applied to the hand pose parameters. Results on four datasets, RKS-PERSIANSIGN (99.5±0.04), First-Person (91±0.06), ASVID (93±0.05), and isoGD (86.1±0.04), confirm the efficiency of our method in both accuracy (mean+std) and time recognition. Furthermore, our model outperforms or gets competitive results with the state-of-the-art alternatives in IHSLR and hand action recognition.
|
|
|
German Barquero, Johnny Nuñez, Sergio Escalera, Zhen Xu, Wei-Wei Tu, & Isabelle Guyon. (2022). Didn’t see that coming: a survey on non-verbal social human behavior forecasting. In Understanding Social Behavior in Dyadic and Small Group Interactions (Vol. 173, pp. 139–178).
Abstract: Non-verbal social human behavior forecasting has increasingly attracted the interest of the research community in recent years. Its direct applications to human-robot interaction and socially-aware human motion generation make it a very attractive field. In this survey, we define the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated. We hold that both problem formulations refer to the same conceptual problem, and identify many shared fundamental challenges: future stochasticity, context awareness, history exploitation, etc. We also propose a taxonomy that comprises
methods published in the last 5 years in a very informative way and describes the current main concerns of the community with regard to this problem. In order to promote further research on this field, we also provide a summarized and friendly overview of audiovisual datasets featuring non-acted social interactions. Finally, we describe the most common metrics used in this task and their particular issues.
|
|
|
Hugo Jair Escalante, Heysem Kaya, Albert Ali Salah, Sergio Escalera, Yagmur Gucluturk, Umut Guçlu, et al. (2022). Modeling, Recognizing, and Explaining Apparent Personality from Videos. TAC - IEEE Transactions on Affective Computing, 13(2), 894–911.
Abstract: Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.
|
|
|
Marc Oliu, Sarah Adel Bargal, Stan Sclaroff, Xavier Baro, & Sergio Escalera. (2022). Multi-varied Cumulative Alignment for Domain Adaptation. In 6th International Conference on Image Analysis and Processing (Vol. 13232, 324–334). LNCS.
Abstract: Domain Adaptation methods can be classified into two basic families of approaches: non-parametric and parametric. Non-parametric approaches depend on statistical indicators such as feature covariances to minimize the domain shift. Non-parametric approaches tend to be fast to compute and require no additional parameters, but they are unable to leverage probability density functions with complex internal structures. Parametric approaches, on the other hand, use models of the probability distributions as surrogates in minimizing the domain shift, but they require additional trainable parameters to model these distributions. In this work, we propose a new statistical approach to minimizing the domain shift based on stochastically projecting and evaluating the cumulative density function in both domains. As with non-parametric approaches, there are no additional trainable parameters. As with parametric approaches, the internal structure of both domains’ probability distributions is considered, thus leveraging a higher amount of information when reducing the domain shift. Evaluation on standard datasets used for Domain Adaptation shows better performance of the proposed model compared to non-parametric approaches while being competitive with parametric ones. (Code available at: https://github.com/moliusimon/mca).
Keywords: Domain Adaptation; Computer vision; Neural networks
|
|
|
Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, et al. (2022). ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition. TCIBERN - IEEE Transactions on Cybernetics, 52(5), 3422–3433.
Abstract: The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.
|
|