Home | [1–10] << 11 >> |
Records | |||||
---|---|---|---|---|---|
Author | ChuanMing Fang; Kai Wang; Joost Van de Weijer | ||||
Title | IterInv: Iterative Inversion for Pixel-Level T2I Models | Type | Conference Article | ||
Year | 2023 | Publication | 37th Annual Conference on Neural Information Processing Systems | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{this https URL}. | ||||
Address | New Orleans; USA; December 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | NEURIPS | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ FWW2023 | Serial | 3936 | ||
Permanent link to this record | |||||
Author | Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell | ||||
Title | Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung | Type | Conference Article | ||
Year | 2023 | Publication | SEPAR | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Pòster | ||||
Address | Granada; Spain; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SEPAR | ||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ BGS2023 | Serial | 3951 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Dario Carpio; Angel Sappa | ||||
Title | Depth Map Estimation from a Single 2D Image | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Signal-Image Technology & Internet-Based Systems | Abbreviated Journal | |
Volume | Issue | Pages | 347-353 | ||
Keywords | |||||
Abstract | This paper presents an innovative architecture based on a Cycle Generative Adversarial Network (CycleGAN) for the synthesis of high-quality depth maps from monocular images. The proposed architecture leverages a diverse set of loss functions, including cycle consistency, contrastive, identity, and least square losses, to facilitate the generation of depth maps that exhibit realism and high fidelity. A notable feature of the approach is its ability to synthesize depth maps from grayscale images without the need for paired training data. Extensive comparisons with different state-of-the-art methods show the superiority of the proposed approach in both quantitative metrics and visual quality. This work addresses the challenge of depth map synthesis and offers significant advancements in the field. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SCS2023b | Serial | 4009 | ||
Permanent link to this record | |||||
Author | Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa | ||||
Title | Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Signal-Image Technology & Internet-Based Systems | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ RVS2023 | Serial | 4010 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Dario Carpio; Angel Sappa | ||||
Title | Boosting Guided Super-Resolution Performance with Synthesized Images | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Signal-Image Technology & Internet-Based Systems | Abbreviated Journal | |
Volume | Issue | Pages | 189-195 | ||
Keywords | |||||
Abstract | Guided image processing techniques are widely used for extracting information from a guiding image to aid in the processing of the guided one. These images may be sourced from different modalities, such as 2D and 3D, or different spectral bands, like visible and infrared. In the case of guided cross-spectral super-resolution, features from the two modal images are extracted and efficiently merged to migrate guidance information from one image, usually high-resolution (HR), toward the guided one, usually low-resolution (LR). Different approaches have been recently proposed focusing on the development of architectures for feature extraction and merging in the cross-spectral domains, but none of them care about the different nature of the given images. This paper focuses on the specific problem of guided thermal image super-resolution, where an LR thermal image is enhanced by an HR visible spectrum image. To improve existing guided super-resolution techniques, a novel scheme is proposed that maps the original guiding information to a thermal image-like representation that is similar to the output. Experimental results evaluating five different approaches demonstrate that the best results are achieved when the guiding and guided images share the same domain. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SCS2023c | Serial | 4011 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Angel Sappa | ||||
Title | Toward a Thermal Image-Like Representation | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications | Abbreviated Journal | |
Volume | Issue | Pages | 133-140 | ||
Keywords | |||||
Abstract | This paper proposes a novel model to obtain thermal image-like representations to be used as an input in any thermal image compressive sensing approach (e.g., thermal image: filtering, enhancing, super-resolution). Thermal images offer interesting information about the objects in the scene, in addition to their temperature. Unfortunately, in most of the cases thermal cameras acquire low resolution/quality images. Hence, in order to improve these images, there are several state-of-the-art approaches that exploit complementary information from a low-cost channel (visible image) to increase the image quality of an expensive channel (infrared image). In these SOTA approaches visible images are fused at different levels without paying attention the images acquire information at different bands of the spectral. In this paper a novel approach is proposed to generate thermal image-like representations from a low cost visible images, by means of a contrastive cycled GAN network. Obtained representations (synthetic thermal image) can be later on used to improve the low quality thermal image of the same scene. Experimental results on different datasets are presented. | ||||
Address | Lisboa; Portugal; February 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISIGRAPP | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SuS2023b | Serial | 3927 | ||
Permanent link to this record | |||||
Author | David Dueñas; Mostafa Kamal; Petia Radeva | ||||
Title | Efficient Deep Learning Ensemble for Skin Lesion Classification | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications | Abbreviated Journal | |
Volume | Issue | Pages | 303-314 | ||
Keywords | |||||
Abstract | Vision Transformers (ViTs) are deep learning techniques that have been gaining in popularity in recent years.
In this work, we study the performance of ViTs and Convolutional Neural Networks (CNNs) on skin lesions classification tasks, specifically melanoma diagnosis. We show that regardless of the performance of both architectures, an ensemble of them can improve their generalization. We also present an adaptation to the Gram-OOD* method (detecting Out-of-distribution (OOD) using Gram matrices) for skin lesion images. Moreover, the integration of super-convergence was critical to success in building models with strict computing and training time constraints. We evaluated our ensemble of ViTs and CNNs, demonstrating that generalization is enhanced by placing first in the 2019 and third in the 2020 ISIC Challenge Live Leaderboards (available at https://challenge.isic-archive.com/leaderboards/live/). |
||||
Address | Lisboa; Portugal; February 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISIGRAPP | ||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ DKR2023 | Serial | 3928 | ||
Permanent link to this record | |||||
Author | Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar | ||||
Title | Watching the News: Towards VideoQA Models that can Read | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ JMK2023 | Serial | 3899 | ||
Permanent link to this record | |||||
Author | Marcos V Conde; Florin Vasluianu; Javier Vazquez; Radu Timofte | ||||
Title | Perceptual image enhancement for smartphone real-time applications | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1848-1858 | ||
Keywords | |||||
Abstract | Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | MACO; CIC | Approved | no | ||
Call Number | Admin @ si @ CVV2023 | Serial | 3900 | ||
Permanent link to this record | |||||
Author | Dipam Goswami; J Schuster; Joost Van de Weijer; Didier Stricker | ||||
Title | Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 3195-3204 | ||
Keywords | |||||
Abstract | Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation. D Goswami, R Schuster, J van de Weijer, D Stricker. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3195-3204 | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ GSW2023 | Serial | 3901 | ||
Permanent link to this record | |||||
Author | Mickael Cormier; Andreas Specker; Julio C. S. Jacques; Lucas Florin; Jurgen Metzler; Thomas B. Moeslund; Kamal Nasrollahi; Sergio Escalera; Jurgen Beyerer | ||||
Title | UPAR Challenge: Pedestrian Attribute Recognition and Attribute-based Person Retrieval – Dataset, Design, and Results | Type | Conference Article | ||
Year | 2023 | Publication | 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 166-175 | ||
Keywords | |||||
Abstract | In civilian video security monitoring, retrieving and tracking a person of interest often rely on witness testimony and their appearance description. Deployed systems rely on a large amount of annotated training data and are expected to show consistent performance in diverse areas and gen-eralize well between diverse settings w.r.t. different view-points, illumination, resolution, occlusions, and poses for indoor and outdoor scenes. However, for such generalization, the system would require a large amount of various an-notated data for training and evaluation. The WACV 2023 Pedestrian Attribute Recognition and Attributed-based Per-son Retrieval Challenge (UPAR-Challenge) aimed to spot-light the problem of domain gaps in a real-world surveil-lance context and highlight the challenges and limitations of existing methods. The UPAR dataset, composed of 40 important binary attributes over 12 attribute categories across four datasets, was extended with data captured from a low-flying UAV from the P-DESTRE dataset. To this aim, 0.6M additional annotations were manually labeled and vali-dated. Each track evaluated the robustness of the competing methods to domain shifts by training on limited data from a specific domain and evaluating using data from unseen do-mains. The challenge attracted 41 registered participants, but only one team managed to outperform the baseline on one track, emphasizing the task's difficulty. This work de-scribes the challenge design, the adopted dataset, obtained results, as well as future directions on the topic. | ||||
Address | Waikoloa; Hawai; USA; January 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACVW | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ CSJ2023 | Serial | 3902 | ||
Permanent link to this record |