Records |
Author |
Mustafa Hajij; Mathilde Papillon; Florian Frantzen; Jens Agerberg; Ibrahem AlJabea; Ruben Ballester; Claudio Battiloro; Guillermo Bernardez; Tolga Birdal; Aiden Brent; Peter Chin; Sergio Escalera; Simone Fiorellino; Odin Hoff Gardaa; Gurusankar Gopalakrishnan; Devendra Govil; Josef Hoppe; Maneel Reddy Karri; Jude Khouja; Manuel Lecha; Neal Livesay; Jan Meibner; Soham Mukherjee; Alexander Nikitin; Theodore Papamarkou; Jaro Prilepok; Karthikeyan Natesan Ramamurthy; Paul Rosen; Aldo Guzman-Saenz; Alessandro Salatiello; Shreyas N. Samaga; Simone Scardapane; Michael T. Schaub; Luca Scofano; Indro Spinelli; Lev Telyatnikov; Quang Truong; Robin Walters; Maosheng Yang; Olga Zaghen; Ghada Zamzmi; Ali Zia; Nina Miolane |
Title |
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ HPF2024 |
Serial |
4021 |
Permanent link to this record |
|
|
|
Author |
German Barquero; Sergio Escalera; Cristina Palmero |
Title |
Seamless Human Motion Composition with Blended Positional Encodings |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ BEP2024 |
Serial |
4022 |
Permanent link to this record |
|
|
|
Author |
Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal |
Title |
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: this https URL. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ BBL2024b |
Serial |
4023 |
Permanent link to this record |
|
|
|
Author |
G. Gasbarri; Matias Bilkis; E. Roda Salichs; J. Calsamiglia |
Title |
Sequential hypothesis testing for continuously-monitored quantum systems |
Type |
Journal Article |
Year |
2024 |
Publication |
Quantum |
Abbreviated Journal |
|
Volume |
8 |
Issue |
1289 |
Pages |
|
Keywords |
|
Abstract |
We consider a quantum system that is being continuously monitored, giving rise to a measurement signal. From such a stream of data, information needs to be inferred about the underlying system's dynamics. Here we focus on hypothesis testing problems and put forward the usage of sequential strategies where the signal is analyzed in real time, allowing the experiment to be concluded as soon as the underlying hypothesis can be identified with a certified prescribed success probability. We analyze the performance of sequential tests by studying the stopping-time behavior, showing a considerable advantage over currently-used strategies based on a fixed predetermined measurement time. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
xxxx |
Approved |
no |
Call Number |
Admin @ si @ GBR2024 |
Serial |
3847 |
Permanent link to this record |
|
|
|
Author |
M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat |
Title |
Implicit Learning of Scene Geometry From Poses for Global Localization |
Type |
Journal Article |
Year |
2024 |
Publication |
IEEE Robotics and Automation Letters |
Abbreviated Journal |
ROBOTAUTOMLET |
Volume |
9 |
Issue |
2 |
Pages |
955-962 |
Keywords |
Localization; Localization and mapping; Deep learning for visual perception; Visual learning |
Abstract |
Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
2377-3766 |
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS |
Approved |
no |
Call Number |
Admin @ si @ |
Serial |
3857 |
Permanent link to this record |
|
|
|
Author |
Hao Fang; Ajian Liu; Jun Wan; Sergio Escalera; Chenxu Zhao; Xu Zhang; Stan Z Li; Zhen Lei |
Title |
Surveillance Face Anti-spoofing |
Type |
Journal Article |
Year |
2024 |
Publication |
IEEE Transactions on Information Forensics and Security |
Abbreviated Journal |
TIFS |
Volume |
19 |
Issue |
|
Pages |
1535-1546 |
Keywords |
|
Abstract |
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ FLW2024 |
Serial |
3869 |
Permanent link to this record |
|
|
|
Author |
Aura Hernandez-Sabate; Jose Elias Yauri; Pau Folch; Daniel Alvarez; Debora Gil |
Title |
EEG Dataset Collection for Mental Workload Predictions in Flight-Deck Environment |
Type |
Journal Article |
Year |
2024 |
Publication |
Sensors |
Abbreviated Journal |
SENS |
Volume |
24 |
Issue |
4 |
Pages |
1174 |
Keywords |
|
Abstract |
High mental workload reduces human performance and the ability to correctly carry out complex tasks. In particular, aircraft pilots enduring high mental workloads are at high risk of failure, even with catastrophic outcomes. Despite progress, there is still a lack of knowledge about the interrelationship between mental workload and brain functionality, and there is still limited data on flight-deck scenarios. Although recent emerging deep-learning (DL) methods using physiological data have presented new ways to find new physiological markers to detect and assess cognitive states, they demand large amounts of properly annotated datasets to achieve good performance. We present a new dataset of electroencephalogram (EEG) recordings specifically collected for the recognition of different levels of mental workload. The data were recorded from three experiments, where participants were induced to different levels of workload through tasks of increasing cognition demand. The first involved playing the N-back test, which combines memory recall with arithmetical skills. The second was playing Heat-the-Chair, a serious game specifically designed to emphasize and monitor subjects under controlled concurrent tasks. The third was flying in an Airbus320 simulator and solving several critical situations. The design of the dataset has been validated on three different levels: (1) correlation of the theoretical difficulty of each scenario to the self-perceived difficulty and performance of subjects; (2) significant difference in EEG temporal patterns across the theoretical difficulties and (3) usefulness for the training and evaluation of AI models. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
IAM |
Approved |
no |
Call Number |
Admin @ si @ HYF2024 |
Serial |
4019 |
Permanent link to this record |
|
|
|
Author |
Javier Vazquez; Graham D. Finlayson; Luis Herranz |
Title |
Improving the perception of low-light enhanced images |
Type |
Journal Article |
Year |
2024 |
Publication |
Optics Express |
Abbreviated Journal |
|
Volume |
32 |
Issue |
4 |
Pages |
5174-5190 |
Keywords |
|
Abstract |
Improving images captured under low-light conditions has become an important topic in computational color imaging, as it has a wide range of applications. Most current methods are either based on handcrafted features or on end-to-end training of deep neural networks that mostly focus on minimizing some distortion metric —such as PSNR or SSIM— on a set of training images. However, the minimization of distortion metrics does not mean that the results are optimal in terms of perception (i.e. perceptual quality). As an example, the perception-distortion trade-off states that, close to the optimal results, improving distortion results in worsening perception. This means that current low-light image enhancement methods —that focus on distortion minimization— cannot be optimal in the sense of obtaining a good image in terms of perception errors. In this paper, we propose a post-processing approach in which, given the original low-light image and the result of a specific method, we are able to obtain a result that resembles as much as possible that of the original method, but, at the same time, giving an improvement in the perception of the final image. More in detail, our method follows the hypothesis that in order to minimally modify the perception of an input image, any modification should be a combination of a local change in the shading across a scene and a global change in illumination color. We demonstrate the ability of our method quantitatively using perceptual blind image metrics such as BRISQUE, NIQE, or UNIQUE, and through user preference tests. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MACO |
Approved |
no |
Call Number |
Admin @ si @ VFH2024 |
Serial |
4018 |
Permanent link to this record |
|
|
|
Author |
Vacit Oguz Yazici; Longlong Yu; Arnau Ramisa; Luis Herranz; Joost Van de Weijer |
Title |
Main product detection with graph networks for fashion |
Type |
Journal Article |
Year |
2024 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
Volume |
83 |
Issue |
|
Pages |
3215–3231 |
Keywords |
|
Abstract |
Computer vision has established a foothold in the online fashion retail industry. Main product detection is a crucial step of vision-based fashion product feed parsing pipelines, focused on identifying the bounding boxes that contain the product being sold in the gallery of images of the product page. The current state-of-the-art approach does not leverage the relations between regions in the image, and treats images of the same product independently, therefore not fully exploiting visual and product contextual information. In this paper, we propose a model that incorporates Graph Convolutional Networks (GCN) that jointly represent all detected bounding boxes in the gallery as nodes. We show that the proposed method is better than the state-of-the-art, especially, when we consider the scenario where title-input is missing at inference time and for cross-dataset evaluation, our method outperforms previous approaches by a large margin. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; MACO; 600.147; 600.167; 600.164; 600.161; 600.141; 601.309 |
Approved |
no |
Call Number |
Admin @ si @ YYR2024 |
Serial |
4017 |
Permanent link to this record |
|
|
|
Author |
Yaxing Wang; Abel Gonzalez-Garcia; Chenshen Wu; Luis Herranz; Fahad Shahbaz Khan; Shangling Jui; Jian Yang; Joost Van de Weijer |
Title |
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains |
Type |
Journal Article |
Year |
2024 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
132 |
Issue |
|
Pages |
490–514 |
Keywords |
|
Abstract |
Given the often enormous effort required to train GANs, both computationally as well as in dataset collection, the re-use of pretrained GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs. MineGAN. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; MACO |
Approved |
no |
Call Number |
Admin @ si @ WGW2024 |
Serial |
3888 |
Permanent link to this record |
|
|
|
Author |
Tao Wu; Kai Wang; Chuanming Tang; Jianlin Zhang |
Title |
Diffusion-based network for unsupervised landmark detection |
Type |
Journal Article |
Year |
2024 |
Publication |
Knowledge-Based Systems |
Abbreviated Journal |
|
Volume |
292 |
Issue |
|
Pages |
111627 |
Keywords |
|
Abstract |
Landmark detection is a fundamental task aiming at identifying specific landmarks that serve as representations of distinct object features within an image. However, the present landmark detection algorithms often adopt complex architectures and are trained in a supervised manner using large datasets to achieve satisfactory performance. When faced with limited data, these algorithms tend to experience a notable decline in accuracy. To address these drawbacks, we propose a novel diffusion-based network (DBN) for unsupervised landmark detection, which leverages the generation ability of the diffusion models to detect the landmark locations. In particular, we introduce a dual-branch encoder (DualE) for extracting visual features and predicting landmarks. Additionally, we lighten the decoder structure for faster inference, referred to as LightD. By this means, we avoid relying on extensive data comparison and the necessity of designing complex architectures as in previous methods. Experiments on CelebA, AFLW, 300W and Deepfashion benchmarks have shown that DBN performs state-of-the-art compared to the existing methods. Furthermore, DBN shows robustness even when faced with limited data cases. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP |
Approved |
no |
Call Number |
Admin @ si @ WWT2024 |
Serial |
4024 |
Permanent link to this record |
|
|
|
Author |
Patricia Suarez; Dario Carpio; Angel Sappa |
Title |
Enhancement of guided thermal image super-resolution approaches |
Type |
Journal Article |
Year |
2024 |
Publication |
Neurocomputing |
Abbreviated Journal |
NEUCOM |
Volume |
573 |
Issue |
127197 |
Pages |
1-17 |
Keywords |
|
Abstract |
Guided image processing techniques are widely used to extract meaningful information from a guiding image and facilitate the enhancement of the guided one. This paper specifically addresses the challenge of guided thermal image super-resolution, where a low-resolution thermal image is enhanced using a high-resolution visible spectrum image. We propose a new strategy that enhances outcomes from current guided super-resolution methods. This is achieved by transforming the initial guiding data into a representation resembling a thermal-like image, which is more closely in sync with the intended output. Experimental results with upscale factors of 8 and 16, demonstrate the outstanding performance of our approach in guided thermal image super-resolution obtained by mapping the original guiding information to a thermal-like image representation. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU |
Approved |
no |
Call Number |
Admin @ si @ SCS2024 |
Serial |
3998 |
Permanent link to this record |