toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Souhail Bakkali; Sanket Biswas; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades; Josep Llados edit   pdf
url  openurl
  Title TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBM2023 Serial 3995  
Permanent link to this record
 

 
Author Hunor Laczko; Meysam Madadi; Sergio Escalera; Jordi Gonzalez edit   pdf
url  openurl
  Title A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping Type Conference Article
  Year 2024 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 8709-8718  
  Keywords  
  Abstract RGB cloth generation has been deeply studied in the related literature, however, 3D garment generation remains an open problem. In this paper, we build a conditional variational autoencoder for 3D garment generation and draping. We propose a pyramid network to add garment details progressively in a canonical space, i.e. unposing and unshaping the garments w.r.t. the body. We study conditioning the network on surface normal UV maps, as an intermediate representation, which is an easier problem to optimize than 3D coordinates. Our results on two public datasets, CLOTH3D and CAPE, show that our model is robust, controllable in terms of detail generation by the use of multi-resolution pyramids, and achieves state-of-the-art results that can highly generalize to unseen garments, poses, and shapes even when training with small amounts of data.  
  Address Waikoloa; Hawai; USA; January 2024  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes ISE; HUPBA Approved no  
  Call Number Admin @ si @ LME2024 Serial 3996  
Permanent link to this record
 

 
Author Henry Velesaca; Gisel Bastidas-Guacho; Mohammad Rouhani; Angel Sappa edit  url
openurl 
  Title Multimodal image registration techniques: a comprehensive survey Type Journal Article
  Year 2024 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume Issue Pages  
  Keywords  
  Abstract This manuscript presents a review of state-of-the-art techniques proposed in the literature for multimodal image registration, addressing instances where images from different modalities need to be precisely aligned in the same reference system. This scenario arises when the images to be registered come from different modalities, among the visible and thermal spectral bands, 3D-RGB, or flash-no flash, or NIR-visible. The review spans different techniques from classical approaches to more modern ones based on deep learning, aiming to highlight the particularities required at each step in the registration pipeline when dealing with multimodal images. It is noteworthy that medical images are excluded from this review due to their specific characteristics, including the use of both active and passive sensors or the non-rigid nature of the body contained in the image.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ VBR2024 Serial 3997  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
openurl 
  Title Enhancement of guided thermal image super-resolution approaches Type Journal Article
  Year 2024 Publication Neurocomputing Abbreviated Journal NEUCOM  
  Volume 573 Issue 127197 Pages 1-17  
  Keywords  
  Abstract Guided image processing techniques are widely used to extract meaningful information from a guiding image and facilitate the enhancement of the guided one. This paper specifically addresses the challenge of guided thermal image super-resolution, where a low-resolution thermal image is enhanced using a high-resolution visible spectrum image. We propose a new strategy that enhances outcomes from current guided super-resolution methods. This is achieved by transforming the initial guiding data into a representation resembling a thermal-like image, which is more closely in sync with the intended output. Experimental results with upscale factors of 8 and 16, demonstrate the outstanding performance of our approach in guided thermal image super-resolution obtained by mapping the original guiding information to a thermal-like image representation.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2024 Serial 3998  
Permanent link to this record
 

 
Author Justine Giroux; Mohammad Reza Karimi Dastjerdi; Yannick Hold-Geoffroy; Javier Vazquez; Jean François Lalonde edit   pdf
url  openurl
  Title Towards a Perceptual Evaluation Framework for Lighting Estimation Type Conference Article
  Year 2024 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract rogress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms.  
  Address Seattle; USA; June 2024  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes MACO; CIC Approved no  
  Call Number Admin @ si @ GDH2024 Serial 3999  
Permanent link to this record
 

 
Author Trevor Canham; Javier Vazquez; D Long; Richard F. Murray; Michael S Brown edit   pdf
openurl 
  Title Noise Prism: A Novel Multispectral Visualization Technique Type Journal Article
  Year 2021 Publication 31st Color and Imaging Conference Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract A novel technique for visualizing multispectral images is proposed. Inspired by how prisms work, our method spreads spectral information over a chromatic noise pattern. This is accomplished by populating the pattern with pixels representing each measurement band at a count proportional to its measured intensity. The method is advantageous because it allows for lightweight encoding and visualization of spectral information
while maintaining the color appearance of the stimulus. A four alternative forced choice (4AFC) experiment was conducted to validate the method’s information-carrying capacity in displaying metameric stimuli of varying colors and spectral basis functions. The scores ranged from 100% to 20% (less than chance given the 4AFC task), with many conditions falling somewhere in between at statistically significant intervals. Using this data, color and texture difference metrics can be evaluated and optimized to predict the legibility of the visualization technique.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CIC  
  Notes MACO; CIC Approved no  
  Call Number Admin @ si @ CVL2021 Serial 4000  
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras edit  url
openurl 
  Title SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution Type Conference Article
  Year 2024 Publication 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM.  
  Address Roma; Italia; February 2024  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ RBP2024 Serial 4004  
Permanent link to this record
 

 
Author Mingyi Yang; Fei Yang; Luka Murn; Marc Gorriz Blanch; Juil Sock; Shuai Wan; Fuzheng Yang; Luis Herranz edit  url
doi  openurl
  Title Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks Type Journal Article
  Year 2024 Publication IEEE Transactions on Circuits and Systems for Video Technology Abbreviated Journal  
  Volume Issue Pages  
  Keywords M Yang, F Yang, L Murn, MG Blanch, J Sock, S Wan, F Yang, L Herranz  
  Abstract Visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine tasks, few consider real-world scenarios with multiple tasks. In this paper, we aim to address this gap by proposing a task-switchable pre-processor that optimizes input images specifically for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. The proposed task-switchable pre-processor adeptly maintains relevant semantic information based on the specific characteristics of different downstream tasks, while effectively suppressing irrelevant information to reduce bitrate. To enhance the processing of semantic information for diverse tasks, we leverage pre-extracted semantic features to modulate the pixel-to-pixel mapping within the pre-processor. By switching between different modulations, multiple tasks can be seamlessly incorporated into the system. Extensive experiments demonstrate the practicality and simplicity of our approach. It significantly reduces the number of parameters required for handling multiple tasks while still delivering impressive performance. Our method showcases the potential to achieve efficient and effective compression for machine vision tasks, supporting the evolving demands of real-world applications.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes xxx Approved no  
  Call Number Admin @ si @ YYM2024 Serial 4007  
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras edit  url
openurl 
  Title Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification Type Conference Article
  Year 2023 Publication Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications Abbreviated Journal  
  Volume 14469 Issue Pages 214–228  
  Keywords  
  Abstract Deep learning has made significant advances in recent years, and as a result, it is now in a stage where it can achieve outstanding results in tasks requiring visual understanding of scenes. However, its performance tends to decline when dealing with low-quality images. The advent of super-resolution (SR) techniques has started to have an impact on the field of remote sensing by enabling the restoration of fine details and enhancing image quality, which could help to increase performance in other vision tasks. However, in previous works, contradictory results for scene visual understanding were achieved when SR techniques were applied. In this paper, we present an experimental study on the impact of SR on enhancing aerial scene classification. Through the analysis of different state-of-the-art SR algorithms, including traditional methods and deep learning-based approaches, we unveil the transformative potential of SR in overcoming the limitations of low-resolution (LR) aerial imagery. By enhancing spatial resolution, more fine details are captured, opening the door for an improvement in scene understanding. We also discuss the effect of different image scales on the quality of SR and its effect on aerial scene classification. Our experimental work demonstrates the significant impact of SR on enhancing aerial scene classification compared to LR images, opening new avenues for improved remote sensing applications.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CIARP  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ IBP2023 Serial 4008  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title Depth Map Estimation from a Single 2D Image Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages 347-353  
  Keywords  
  Abstract This paper presents an innovative architecture based on a Cycle Generative Adversarial Network (CycleGAN) for the synthesis of high-quality depth maps from monocular images. The proposed architecture leverages a diverse set of loss functions, including cycle consistency, contrastive, identity, and least square losses, to facilitate the generation of depth maps that exhibit realism and high fidelity. A notable feature of the approach is its ability to synthesize depth maps from grayscale images without the need for paired training data. Extensive comparisons with different state-of-the-art methods show the superiority of the proposed approach in both quantitative metrics and visual quality. This work addresses the challenge of depth map synthesis and offers significant advancements in the field.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023b Serial 4009  
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa edit  url
doi  openurl
  Title Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ RVS2023 Serial 4010  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title Boosting Guided Super-Resolution Performance with Synthesized Images Type Conference Article
  Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal  
  Volume Issue Pages 189-195  
  Keywords  
  Abstract Guided image processing techniques are widely used for extracting information from a guiding image to aid in the processing of the guided one. These images may be sourced from different modalities, such as 2D and 3D, or different spectral bands, like visible and infrared. In the case of guided cross-spectral super-resolution, features from the two modal images are extracted and efficiently merged to migrate guidance information from one image, usually high-resolution (HR), toward the guided one, usually low-resolution (LR). Different approaches have been recently proposed focusing on the development of architectures for feature extraction and merging in the cross-spectral domains, but none of them care about the different nature of the given images. This paper focuses on the specific problem of guided thermal image super-resolution, where an LR thermal image is enhanced by an HR visible spectrum image. To improve existing guided super-resolution techniques, a novel scheme is proposed that maps the original guiding information to a thermal image-like representation that is similar to the output. Experimental results evaluating five different approaches demonstrate that the best results are achieved when the guiding and guided images share the same domain.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023c Serial 4011  
Permanent link to this record
 

 
Author Ruben Perez Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Privacy-Aware Document Visual Question Answering Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ PNT2023 Serial 4012  
Permanent link to this record
 

 
Author Daniel Marczak; Sebastian Cygert; Tomasz Trzcinski; Bartlomiej Twardowski edit  url
openurl 
  Title Revisiting Supervision for Continual Representation Learning Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, recent studies have highlighted the strengths of self-supervised continual representation learning. The improved transferability of representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes xxx Approved no  
  Call Number Admin @ si @ MCT2023 Serial 4013  
Permanent link to this record
 

 
Author Jose Luis Gomez; Manuel Silva; Antonio Seoane; Agnes Borras; Mario Noriega; German Ros; Jose Antonio Iglesias; Antonio Lopez edit   pdf
url  openurl
  Title All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (this http URL).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume (down) Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ GSS2023 Serial 4015  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: