Records |
Author |
Patricia Suarez; Dario Carpio; Angel Sappa |
Title |
Enhancement of guided thermal image super-resolution approaches |
Type |
Journal Article |
Year |
2024 |
Publication |
Neurocomputing |
Abbreviated Journal |
NEUCOM |
Volume |
573 |
Issue |
127197 |
Pages |
1-17 |
Keywords |
|
Abstract |
Guided image processing techniques are widely used to extract meaningful information from a guiding image and facilitate the enhancement of the guided one. This paper specifically addresses the challenge of guided thermal image super-resolution, where a low-resolution thermal image is enhanced using a high-resolution visible spectrum image. We propose a new strategy that enhances outcomes from current guided super-resolution methods. This is achieved by transforming the initial guiding data into a representation resembling a thermal-like image, which is more closely in sync with the intended output. Experimental results with upscale factors of 8 and 16, demonstrate the outstanding performance of our approach in guided thermal image super-resolution obtained by mapping the original guiding information to a thermal-like image representation. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU |
Approved |
no |
Call Number |
Admin @ si @ SCS2024 |
Serial |
3998 |
Permanent link to this record |
|
|
|
Author |
Justine Giroux; Mohammad Reza Karimi Dastjerdi; Yannick Hold-Geoffroy; Javier Vazquez; Jean François Lalonde |
Title |
Towards a Perceptual Evaluation Framework for Lighting Estimation |
Type |
Conference Article |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
rogress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms. |
Address |
Seattle; USA; June 2024 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPR |
Notes |
MACO; CIC |
Approved |
no |
Call Number |
Admin @ si @ GDH2024 |
Serial |
3999 |
Permanent link to this record |
|
|
|
Author |
Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras |
Title |
SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution |
Type |
Conference Article |
Year |
2024 |
Publication |
19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM. |
Address |
Roma; Italia; February 2024 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU;CIC |
Approved |
no |
Call Number |
Admin @ si @ RBP2024 |
Serial |
4004 |
Permanent link to this record |
|
|
|
Author |
Mingyi Yang; Fei Yang; Luka Murn; Marc Gorriz Blanch; Juil Sock; Shuai Wan; Fuzheng Yang; Luis Herranz |
Title |
Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks |
Type |
Journal Article |
Year |
2024 |
Publication |
IEEE Transactions on Circuits and Systems for Video Technology |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
M Yang, F Yang, L Murn, MG Blanch, J Sock, S Wan, F Yang, L Herranz |
Abstract |
Visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine tasks, few consider real-world scenarios with multiple tasks. In this paper, we aim to address this gap by proposing a task-switchable pre-processor that optimizes input images specifically for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. The proposed task-switchable pre-processor adeptly maintains relevant semantic information based on the specific characteristics of different downstream tasks, while effectively suppressing irrelevant information to reduce bitrate. To enhance the processing of semantic information for diverse tasks, we leverage pre-extracted semantic features to modulate the pixel-to-pixel mapping within the pre-processor. By switching between different modulations, multiple tasks can be seamlessly incorporated into the system. Extensive experiments demonstrate the practicality and simplicity of our approach. It significantly reduces the number of parameters required for handling multiple tasks while still delivering impressive performance. Our method showcases the potential to achieve efficient and effective compression for machine vision tasks, supporting the evolving demands of real-world applications. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
xxx |
Approved |
no |
Call Number |
Admin @ si @ YYM2024 |
Serial |
4007 |
Permanent link to this record |
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
Title |
A transformer model for boundary detection in continuous sign language |
Type |
Journal Article |
Year |
2024 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Sign Language Recognition (SLR) has garnered significant attention from researchers in recent years, particularly the intricate domain of Continuous Sign Language Recognition (CSLR), which presents heightened complexity compared to Isolated Sign Language Recognition (ISLR). One of the prominent challenges in CSLR pertains to accurately detecting the boundaries of isolated signs within a continuous video stream. Additionally, the reliance on handcrafted features in existing models poses a challenge to achieving optimal accuracy. To surmount these challenges, we propose a novel approach utilizing a Transformer-based model. Unlike traditional models, our approach focuses on enhancing accuracy while eliminating the need for handcrafted features. The Transformer model is employed for both ISLR and CSLR. The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched using the Transformer model. Subsequently, these enriched features are forwarded to the final classification layer. The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos. The evaluation of our model is conducted on two distinct datasets, including both continuous signs and their corresponding isolated signs, demonstrates promising results. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA;MILAB |
Approved |
no |
Call Number |
Admin @ si @ RKE2024 |
Serial |
4016 |
Permanent link to this record |
|
|
|
Author |
Vacit Oguz Yazici; Longlong Yu; Arnau Ramisa; Luis Herranz; Joost Van de Weijer |
Title |
Main product detection with graph networks for fashion |
Type |
Journal Article |
Year |
2024 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
Volume |
83 |
Issue |
|
Pages |
3215–3231 |
Keywords |
|
Abstract |
Computer vision has established a foothold in the online fashion retail industry. Main product detection is a crucial step of vision-based fashion product feed parsing pipelines, focused on identifying the bounding boxes that contain the product being sold in the gallery of images of the product page. The current state-of-the-art approach does not leverage the relations between regions in the image, and treats images of the same product independently, therefore not fully exploiting visual and product contextual information. In this paper, we propose a model that incorporates Graph Convolutional Networks (GCN) that jointly represent all detected bounding boxes in the gallery as nodes. We show that the proposed method is better than the state-of-the-art, especially, when we consider the scenario where title-input is missing at inference time and for cross-dataset evaluation, our method outperforms previous approaches by a large margin. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; MACO; 600.147; 600.167; 600.164; 600.161; 600.141; 601.309;CIC |
Approved |
no |
Call Number |
Admin @ si @ YYR2024 |
Serial |
4017 |
Permanent link to this record |
|
|
|
Author |
Javier Vazquez; Graham D. Finlayson; Luis Herranz |
Title |
Improving the perception of low-light enhanced images |
Type |
Journal Article |
Year |
2024 |
Publication |
Optics Express |
Abbreviated Journal |
|
Volume |
32 |
Issue |
4 |
Pages |
5174-5190 |
Keywords |
|
Abstract |
Improving images captured under low-light conditions has become an important topic in computational color imaging, as it has a wide range of applications. Most current methods are either based on handcrafted features or on end-to-end training of deep neural networks that mostly focus on minimizing some distortion metric —such as PSNR or SSIM— on a set of training images. However, the minimization of distortion metrics does not mean that the results are optimal in terms of perception (i.e. perceptual quality). As an example, the perception-distortion trade-off states that, close to the optimal results, improving distortion results in worsening perception. This means that current low-light image enhancement methods —that focus on distortion minimization— cannot be optimal in the sense of obtaining a good image in terms of perception errors. In this paper, we propose a post-processing approach in which, given the original low-light image and the result of a specific method, we are able to obtain a result that resembles as much as possible that of the original method, but, at the same time, giving an improvement in the perception of the final image. More in detail, our method follows the hypothesis that in order to minimally modify the perception of an input image, any modification should be a combination of a local change in the shading across a scene and a global change in illumination color. We demonstrate the ability of our method quantitatively using perceptual blind image metrics such as BRISQUE, NIQE, or UNIQUE, and through user preference tests. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MACO;CIC |
Approved |
no |
Call Number |
Admin @ si @ VFH2024 |
Serial |
4018 |
Permanent link to this record |
|
|
|
Author |
Beata Megyesi; Alicia Fornes; Nils Kopal; Benedek Lang |
Title |
Historical Cryptology |
Type |
Book Chapter |
Year |
2024 |
Publication |
Learning and Experiencing Cryptography with CrypTool and SageMath |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Historical cryptology studies (original) encrypted manuscripts, often handwritten sources, produced in our history. These historical sources can be found in archives, often hidden without any indexing and therefore hard to locate. Once found they need to be digitized and turned into a machine-readable text format before they can be deciphered with computational methods. The focus of historical cryptology is not primarily the development of sophisticated algorithms for decipherment, but rather the entire process of analysis of the encrypted source from collection and digitization to transcription and decryption. The process also includes the interpretation and contextualization of the message set in its historical context. There are many challenges on the way, such as mistakes made by the scribe, errors made by the transcriber, damaged pages, handwriting styles that are difficult to interpret, historical languages from various time periods, and hidden underlying language of the message. Ciphertexts vary greatly in terms of their code system and symbol sets used with more or less distinguishable symbols. Ciphertexts can be embedded in clearly written text, or shorter or longer sequences of cleartext can be embedded in the ciphertext. The ciphers used mostly in historical times are substitutions (simple, homophonic, or polyphonic), with or without nomenclatures, encoded as digits or symbol sequences, with or without spaces. So the circumstances are different from those in modern cryptography which focuses on methods (algorithms) and their strengths and assumes that the algorithm is applied correctly. For both historical and modern cryptology, attack vectors outside the algorithm are applied like implementation flaws and side-channel attacks. In this chapter, we give an introduction to the field of historical cryptology and present an overview of how researchers today process historical encrypted sources. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ MFK2024 |
Serial |
4020 |
Permanent link to this record |
|
|
|
Author |
Mustafa Hajij; Mathilde Papillon; Florian Frantzen; Jens Agerberg; Ibrahem AlJabea; Ruben Ballester; Claudio Battiloro; Guillermo Bernardez; Tolga Birdal; Aiden Brent; Peter Chin; Sergio Escalera; Simone Fiorellino; Odin Hoff Gardaa; Gurusankar Gopalakrishnan; Devendra Govil; Josef Hoppe; Maneel Reddy Karri; Jude Khouja; Manuel Lecha; Neal Livesay; Jan Meibner; Soham Mukherjee; Alexander Nikitin; Theodore Papamarkou; Jaro Prilepok; Karthikeyan Natesan Ramamurthy; Paul Rosen; Aldo Guzman-Saenz; Alessandro Salatiello; Shreyas N. Samaga; Simone Scardapane; Michael T. Schaub; Luca Scofano; Indro Spinelli; Lev Telyatnikov; Quang Truong; Robin Walters; Maosheng Yang; Olga Zaghen; Ghada Zamzmi; Ali Zia; Nina Miolane |
Title |
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA;MILAB |
Approved |
no |
Call Number |
Admin @ si @ HPF2024 |
Serial |
4021 |
Permanent link to this record |
|
|
|
Author |
German Barquero; Sergio Escalera; Cristina Palmero |
Title |
Seamless Human Motion Composition with Blended Positional Encodings |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA;MILAB |
Approved |
no |
Call Number |
Admin @ si @ BEP2024 |
Serial |
4022 |
Permanent link to this record |
|
|
|
Author |
Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal |
Title |
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation |
Type |
Miscellaneous |
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: this https URL. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ BBL2024b |
Serial |
4023 |
Permanent link to this record |
|
|
|
Author |
Tao Wu; Kai Wang; Chuanming Tang; Jianlin Zhang |
Title |
Diffusion-based network for unsupervised landmark detection |
Type |
Journal Article |
Year |
2024 |
Publication |
Knowledge-Based Systems |
Abbreviated Journal |
|
Volume |
292 |
Issue |
|
Pages |
111627 |
Keywords |
|
Abstract |
Landmark detection is a fundamental task aiming at identifying specific landmarks that serve as representations of distinct object features within an image. However, the present landmark detection algorithms often adopt complex architectures and are trained in a supervised manner using large datasets to achieve satisfactory performance. When faced with limited data, these algorithms tend to experience a notable decline in accuracy. To address these drawbacks, we propose a novel diffusion-based network (DBN) for unsupervised landmark detection, which leverages the generation ability of the diffusion models to detect the landmark locations. In particular, we introduce a dual-branch encoder (DualE) for extracting visual features and predicting landmarks. Additionally, we lighten the decoder structure for faster inference, referred to as LightD. By this means, we avoid relying on extensive data comparison and the necessity of designing complex architectures as in previous methods. Experiments on CelebA, AFLW, 300W and Deepfashion benchmarks have shown that DBN performs state-of-the-art compared to the existing methods. Furthermore, DBN shows robustness even when faced with limited data cases. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP |
Approved |
no |
Call Number |
Admin @ si @ WWT2024 |
Serial |
4024 |
Permanent link to this record |