|
Zahra Raisi-Estabragh, Carlos Martin-Isla, Louise Nissen, Liliana Szabo, Victor M. Campello, Sergio Escalera, et al. (2023). Radiomics analysis enhances the diagnostic performance of CMR stress perfusion: a proof-of-concept study using the Dan-NICAD dataset. FCM - Frontiers in Cardiovascular Medicine, .
|
|
|
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, & Thomas B. (2023). Which Tokens to Use? Investigating Token Reduction in Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.
Abstract: Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.
|
|
|
Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z Li. (2023). Advances in Face Presentation Attack Detection.
|
|
|
Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z Li. (2023). Face Presentation Attack Detection (PAD) Challenges. In Advances in Face Presentation Attack Detection (17–35). SLCV.
Abstract: In recent years, the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized three editions of Face Anti-spoofing Workshop and Competition at CVPR 2019, CVPR 2020, and ICCV 2021, which have attracted more than 800 teams from academia and industry, and greatly promoted the algorithms to overcome many challenging problems. In this chapter, we introduce the detailed competition process, including the challenge phases, timeline and evaluation metrics. Along with the workshop, we will introduce the corresponding dataset for each competition including data acquisition details, data processing, statistics, and evaluation protocol. Finally, we provide the available link to download the datasets used in the challenges.
|
|
|
Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z Li. (2023). Best Solutions Proposed in the Context of the Face Anti-spoofing Challenge Series. In Advances in Face Presentation Attack Detection (37–78).
Abstract: The PAD competitions we organized attracted more than 835 teams from home and abroad, most of them from the industry, which shows that the topic of face anti-spoofing is closely related to daily life, and there is an urgent need for advanced algorithms to solve its application needs. Specifically, the Chalearn LAP multi-modal face anti-spoofing attack detection challenge attracted more than 300 teams for the development phase with a total of 13 teams qualifying for the final round; the Chalearn Face Anti-spoofing Attack Detection Challenge attracted 340 teams in the development stage, and finally, 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively; the 3D High-Fidelity Mask Face Presentation Attack Detection Challenge attracted 195 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-run by the organizing team, and the results were used for the final ranking. In this chapter, we briefly the methods developed by the teams participating in each competition, and introduce the algorithm details of the top-three ranked teams in detail.
|
|
|
Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z Li. (2023). Face Anti-spoofing Progress Driven by Academic Challenges. In Advances in Face Presentation Attack Detection (1–15). SLCV.
Abstract: With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that prevent the recording and distribution of facial images, the little attention that the community has given to potential ethnic biases among others. This chapter provides an overview of contributions derived from the organization of academic challenges in the context of face anti-spoofing detection. Specifically, we discuss the limitations of benchmarks and summarize our efforts in trying to boost research by the community via the participation in academic challenges
|
|
|
Artur Xarles, Sergio Escalera, Thomas B. Moeslund, & Albert Clapes. (2023). ASTRA: An Action Spotting TRAnsformer for Soccer Videos. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (93–102).
Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
|
|
|
Adrien Pavao, Isabelle Guyon, Anne-Catherine Letournel, Dinh-Tuan Tran, Xavier Baro, Hugo Jair Escalante, et al. (2023). CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges. JMLR - Journal of Machine Learning Research, .
Abstract: CodaLab Competitions is an open source web platform designed to help data scientists and research teams to crowd-source the resolution of machine learning problems through the organization of competitions, also called challenges or contests. CodaLab Competitions provides useful features such as multiple phases, results and code submissions, multi-score leaderboards, and jobs running
inside Docker containers. The platform is very flexible and can handle large scale experiments, by allowing organizers to upload large datasets and provide their own CPU or GPU compute workers.
|
|
|
Ruben Ballester, Carles Casacuberta, & Sergio Escalera. (2023). Decorrelating neurons using persistence.
Abstract: We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive set of experiments to validate the effectiveness of our terms, showing that they outperform popular ones. Also, we demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms, suggesting that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression.
|
|
|
Anders Skaarup Johansen, Kamal Nasrollahi, Sergio Escalera, & Thomas B. Moeslund. (2023). Who Cares about the Weather? Inferring Weather Conditions for Weather-Aware Object Detection in Thermal Images. AS - Applied Sciences, 13(18).
Abstract: Deployments of real-world object detection systems often experience a degradation in performance over time due to concept drift. Systems that leverage thermal cameras are especially susceptible because the respective thermal signatures of objects and their surroundings are highly sensitive to environmental changes. In this study, two types of weather-aware latent conditioning methods are investigated. The proposed method aims to guide two object detectors, (YOLOv5 and Deformable DETR) to become weather-aware. This is achieved by leveraging an auxiliary branch that predicts weather-related information while conditioning intermediate layers of the object detector. While the conditioning methods proposed do not directly improve the accuracy of baseline detectors, it can be observed that conditioned networks manage to extract a weather-related signal from the thermal images, thus resulting in a decreased miss rate at the cost of increased false positives. The extracted signal appears noisy and is thus challenging to regress accurately. This is most likely a result of the qualitative nature of the thermal sensor; thus, further work is needed to identify an ideal method for optimizing the conditioning branch, as well as to further improve the accuracy of the system.
Keywords: thermal; object detection; concept drift; conditioning; weather recognition
|
|
|
Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, et al. (2023). SoccerNet 2023 Challenges Results.
Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on this https URL. Baselines and development kits can be found on this https URL.
|
|
|
Razieh Rastgoo, Kourosh Kiani, & Sergio Escalera. (2024). A transformer model for boundary detection in continuous sign language. MTAP - Multimedia Tools and Applications, .
Abstract: Sign Language Recognition (SLR) has garnered significant attention from researchers in recent years, particularly the intricate domain of Continuous Sign Language Recognition (CSLR), which presents heightened complexity compared to Isolated Sign Language Recognition (ISLR). One of the prominent challenges in CSLR pertains to accurately detecting the boundaries of isolated signs within a continuous video stream. Additionally, the reliance on handcrafted features in existing models poses a challenge to achieving optimal accuracy. To surmount these challenges, we propose a novel approach utilizing a Transformer-based model. Unlike traditional models, our approach focuses on enhancing accuracy while eliminating the need for handcrafted features. The Transformer model is employed for both ISLR and CSLR. The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched using the Transformer model. Subsequently, these enriched features are forwarded to the final classification layer. The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos. The evaluation of our model is conducted on two distinct datasets, including both continuous signs and their corresponding isolated signs, demonstrates promising results.
|
|
|
Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Ruben Ballester, et al. (2024). TopoX: A Suite of Python Packages for Machine Learning on Topological Domains.
Abstract: We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL.
|
|
|
German Barquero, Sergio Escalera, & Cristina Palmero. (2024). Seamless Human Motion Composition with Blended Positional Encodings.
Abstract: Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.
|
|
|
Iiris Lusi, Sergio Escalera, & Gholamreza Anbarjafari. (2016). Human Head Pose Estimation on SASE database using Random Hough Regression Forests. In 23rd International Conference on Pattern Recognition Workshops (Vol. 10165). LNCS.
Abstract: In recent years head pose estimation has become an important task in face analysis scenarios. Given the availability of high resolution 3D sensors, the design of a high resolution head pose database would be beneficial for the community. In this paper, Random Hough Forests are used to estimate 3D head pose and location on a new 3D head database, SASE, which represents the baseline performance on the new data for an upcoming international head pose estimation competition. The data in SASE is acquired with a Microsoft Kinect 2 camera, including the RGB and depth information of 50 subjects with a large sample of head poses, allowing us to test methods for real-life scenarios. We briefly review the database while showing baseline head pose estimation results based on Random Hough Forests.
|
|