|   | 
Details
   web
Records
Author Mickael Cormier; Andreas Specker; Julio C. S. Jacques; Lucas Florin; Jurgen Metzler; Thomas B. Moeslund; Kamal Nasrollahi; Sergio Escalera; Jurgen Beyerer
Title UPAR Challenge: Pedestrian Attribute Recognition and Attribute-based Person Retrieval – Dataset, Design, and Results Type Conference Article
Year 2023 Publication 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 166-175
Keywords
Abstract In civilian video security monitoring, retrieving and tracking a person of interest often rely on witness testimony and their appearance description. Deployed systems rely on a large amount of annotated training data and are expected to show consistent performance in diverse areas and gen-eralize well between diverse settings w.r.t. different view-points, illumination, resolution, occlusions, and poses for indoor and outdoor scenes. However, for such generalization, the system would require a large amount of various an-notated data for training and evaluation. The WACV 2023 Pedestrian Attribute Recognition and Attributed-based Per-son Retrieval Challenge (UPAR-Challenge) aimed to spot-light the problem of domain gaps in a real-world surveil-lance context and highlight the challenges and limitations of existing methods. The UPAR dataset, composed of 40 important binary attributes over 12 attribute categories across four datasets, was extended with data captured from a low-flying UAV from the P-DESTRE dataset. To this aim, 0.6M additional annotations were manually labeled and vali-dated. Each track evaluated the robustness of the competing methods to domain shifts by training on limited data from a specific domain and evaluating using data from unseen do-mains. The challenge attracted 41 registered participants, but only one team managed to outperform the baseline on one track, emphasizing the task's difficulty. This work de-scribes the challenge design, the adopted dataset, obtained results, as well as future directions on the topic.
Address Waikoloa; Hawai; USA; January 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) WACVW
Notes HUPBA Approved no
Call Number Admin @ si @ CSJ2023 Serial 3902
Permanent link to this record
 

 
Author Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar
Title Watching the News: Towards VideoQA Models that can Read Type Conference Article
Year 2023 Publication Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
Address Waikoloa; Hawai; USA; January 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) WACV
Notes DAG Approved no
Call Number Admin @ si @ JMK2023 Serial 3899
Permanent link to this record
 

 
Author Marcos V Conde; Florin Vasluianu; Javier Vazquez; Radu Timofte
Title Perceptual image enhancement for smartphone real-time applications Type Conference Article
Year 2023 Publication Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 1848-1858
Keywords
Abstract Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
Address Waikoloa; Hawai; USA; January 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) WACV
Notes MACO; CIC Approved no
Call Number Admin @ si @ CVV2023 Serial 3900
Permanent link to this record
 

 
Author Dipam Goswami; J Schuster; Joost Van de Weijer; Didier Stricker
Title Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation Type Conference Article
Year 2023 Publication Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 3195-3204
Keywords
Abstract Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation. D Goswami, R Schuster, J van de Weijer, D Stricker. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3195-3204
Address Waikoloa; Hawai; USA; January 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) WACV
Notes LAMP Approved no
Call Number Admin @ si @ GSW2023 Serial 3901
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa
Title Toward a Thermal Image-Like Representation Type Conference Article
Year 2023 Publication Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal
Volume Issue Pages 133-140
Keywords
Abstract This paper proposes a novel model to obtain thermal image-like representations to be used as an input in any thermal image compressive sensing approach (e.g., thermal image: filtering, enhancing, super-resolution). Thermal images offer interesting information about the objects in the scene, in addition to their temperature. Unfortunately, in most of the cases thermal cameras acquire low resolution/quality images. Hence, in order to improve these images, there are several state-of-the-art approaches that exploit complementary information from a low-cost channel (visible image) to increase the image quality of an expensive channel (infrared image). In these SOTA approaches visible images are fused at different levels without paying attention the images acquire information at different bands of the spectral. In this paper a novel approach is proposed to generate thermal image-like representations from a low cost visible images, by means of a contrastive cycled GAN network. Obtained representations (synthetic thermal image) can be later on used to improve the low quality thermal image of the same scene. Experimental results on different datasets are presented.
Address Lisboa; Portugal; February 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) VISIGRAPP
Notes MSIAU Approved no
Call Number Admin @ si @ SuS2023b Serial 3927
Permanent link to this record
 

 
Author David Dueñas; Mostafa Kamal; Petia Radeva
Title Efficient Deep Learning Ensemble for Skin Lesion Classification Type Conference Article
Year 2023 Publication Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal
Volume Issue Pages 303-314
Keywords
Abstract Vision Transformers (ViTs) are deep learning techniques that have been gaining in popularity in recent years.
In this work, we study the performance of ViTs and Convolutional Neural Networks (CNNs) on skin lesions classification tasks, specifically melanoma diagnosis. We show that regardless of the performance of both architectures, an ensemble of them can improve their generalization. We also present an adaptation to the Gram-OOD* method (detecting Out-of-distribution (OOD) using Gram matrices) for skin lesion images. Moreover, the integration of super-convergence was critical to success in building models with strict computing and training time constraints. We evaluated our ensemble of ViTs and CNNs, demonstrating that generalization is enhanced by placing first in the 2019 and third in the 2020 ISIC Challenge Live Leaderboards
(available at https://challenge.isic-archive.com/leaderboards/live/).
Address Lisboa; Portugal; February 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) VISIGRAPP
Notes MILAB Approved no
Call Number Admin @ si @ DKR2023 Serial 3928
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa
Title Depth Map Estimation from a Single 2D Image Type Conference Article
Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal
Volume Issue Pages 347-353
Keywords
Abstract This paper presents an innovative architecture based on a Cycle Generative Adversarial Network (CycleGAN) for the synthesis of high-quality depth maps from monocular images. The proposed architecture leverages a diverse set of loss functions, including cycle consistency, contrastive, identity, and least square losses, to facilitate the generation of depth maps that exhibit realism and high fidelity. A notable feature of the approach is its ability to synthesize depth maps from grayscale images without the need for paired training data. Extensive comparisons with different state-of-the-art methods show the superiority of the proposed approach in both quantitative metrics and visual quality. This work addresses the challenge of depth map synthesis and offers significant advancements in the field.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) SITIS
Notes MSIAU Approved no
Call Number Admin @ si @ SCS2023b Serial 4009
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa
Title Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach Type Conference Article
Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) SITIS
Notes MSIAU Approved no
Call Number Admin @ si @ RVS2023 Serial 4010
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa
Title Boosting Guided Super-Resolution Performance with Synthesized Images Type Conference Article
Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal
Volume Issue Pages 189-195
Keywords
Abstract Guided image processing techniques are widely used for extracting information from a guiding image to aid in the processing of the guided one. These images may be sourced from different modalities, such as 2D and 3D, or different spectral bands, like visible and infrared. In the case of guided cross-spectral super-resolution, features from the two modal images are extracted and efficiently merged to migrate guidance information from one image, usually high-resolution (HR), toward the guided one, usually low-resolution (LR). Different approaches have been recently proposed focusing on the development of architectures for feature extraction and merging in the cross-spectral domains, but none of them care about the different nature of the given images. This paper focuses on the specific problem of guided thermal image super-resolution, where an LR thermal image is enhanced by an HR visible spectrum image. To improve existing guided super-resolution techniques, a novel scheme is proposed that maps the original guiding information to a thermal image-like representation that is similar to the output. Experimental results evaluating five different approaches demonstrate that the best results are achieved when the guiding and guided images share the same domain.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) SITIS
Notes MSIAU Approved no
Call Number Admin @ si @ SCS2023c Serial 4011
Permanent link to this record
 

 
Author Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell
Title Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung Type Conference Article
Year 2023 Publication SEPAR Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Pòster
Address Granada; Spain; June 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) SEPAR
Notes IAM Approved no
Call Number Admin @ si @ BGS2023 Serial 3951
Permanent link to this record
 

 
Author Dipam Goswami; Yuyang Liu ; Bartlomiej Twardowski; Joost Van de Weijer
Title FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning Type Conference Article
Year 2023 Publication 37th Annual Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Poster
Address New Orleans; USA; December 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) NEURIPS
Notes LAMP Approved no
Call Number Admin @ si @ GLT2023 Serial 3934
Permanent link to this record
 

 
Author Kai Wang; Fei Yang; Shiqi Yang; Muhammad Atif Butt; Joost Van de Weijer
Title Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Type Conference Article
Year 2023 Publication 37th Annual Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Poster
Address New Orleans; USA; December 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) NEURIPS
Notes LAMP Approved no
Call Number Admin @ si @ WYY2023 Serial 3935
Permanent link to this record
 

 
Author ChuanMing Fang; Kai Wang; Joost Van de Weijer
Title IterInv: Iterative Inversion for Pixel-Level T2I Models Type Conference Article
Year 2023 Publication 37th Annual Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{this https URL}.
Address New Orleans; USA; December 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) NEURIPS
Notes LAMP Approved no
Call Number Admin @ si @ FWW2023 Serial 3936
Permanent link to this record
 

 
Author Christian Keilstrup Ingwersen; Artur Xarles; Albert Clapes; Meysam Madadi; Janus Nortoft Jensen; Morten Rieger Hannemose; Anders Bjorholm Dahl; Sergio Escalera
Title Video-based Skill Assessment for Golf: Estimating Golf Handicap Type Conference Article
Year 2023 Publication Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports Abbreviated Journal
Volume Issue Pages 31-39
Keywords
Abstract Automated skill assessment in sports using video-based analysis holds great potential for revolutionizing coaching methodologies. This paper focuses on the problem of skill determination in golfers by leveraging deep learning models applied to a large database of video recordings of golf swings. We investigate different regression, ranking and classification based methods and compare to a simple baseline approach. The performance is evaluated using mean squared error (MSE) as well as computing the percentages of correctly ranked pairs based on the Kendall correlation. Our results demonstrate an improvement over the baseline, with a 35% lower mean squared error and 68% correctly ranked pairs. However, achieving fine-grained skill assessment remains challenging. This work contributes to the development of AI-driven coaching systems and advances the understanding of video-based skill determination in the context of golf.
Address Otawa; Canada; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) MMSports
Notes HUPBA Approved no
Call Number Admin @ si @ KXC2023 Serial 3929
Permanent link to this record
 

 
Author Artur Xarles; Sergio Escalera; Thomas B. Moeslund; Albert Clapes
Title ASTRA: An Action Spotting TRAnsformer for Soccer Videos Type Conference Article
Year 2023 Publication Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports Abbreviated Journal
Volume Issue Pages 93–102
Keywords
Abstract In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.
Address Otawa; Canada; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down) MMSports
Notes HUPBA Approved no
Call Number Admin @ si @ XEM2023 Serial 3970
Permanent link to this record