Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, et al. (2023). StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing.
Abstract: A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.
|
A. Pujol, Jose Luis Alba, & Juan J. Villanueva. (2001). Supervised Hausdorff-based measures for face recognition..
|
Angel Sappa. (2004). Surface Model Generation from Range Images of Industrial Environments.
|
Oriol Ramos Terrades, Salvatore Tabbone, L. Wendling, & Ernest Valveny. (2004). Symbol Recognition based on a Multiresolution Analysis of the Radon Transform.
|
Ernest Valveny, & Philippe Dosch. (2004). Symbol Recognition Contest: A Synthesis.
|
Josep Llados, & Gemma Sanchez. (2003). Symbol Recognition Using Graphs.
|
Marçal Rusiñol, & Josep Llados. (2005). Symbol Spotting in Technical Drawings Using Vectorial Signatures.
|
Gemma Sanchez, & Josep Llados. (2003). Syntactic models to represent perceptually regular repetitive patterns in graphic documents.
|
Gemma Sanchez, & Josep Llados. (2004). Syntactic models to represent perceptually regular repetitive patterns in graphic documents.
|
David Lloret, & Joan Serrat. (1999). System for calibration of a stereotatic frame..
|
David Lloret, & Derek L.G. Hill. (1999). System for live fusion of 2-D ultrasound scans to pre-interventional MR volumes of a patient..
|
Damian Sojka, Yuyang Liu, Dipam Goswami, Sebastian Cygert, Bartłomiej Twardowski, & Joost van de Weijer. (2023). Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation.
Abstract: The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset – SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available.
The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios.
|
Craig Von Land, Ricardo Toledo, & Juan J. Villanueva. (1996). TeleRegion: Tele-Applications for European Regions.
|
A. Pujol, Felipe Lumbreras, Javier Varona, & Juan J. Villanueva. (1999). Template matching through invariant eigenspace projection..
|
Antonio Lopez, J. Hilgenstock, A. Busse, Ramon Baldrich, Felipe Lumbreras, & Joan Serrat. (2008). Temporal Coherence Analysis for Intelligent Headlight Control.
Keywords: Intelligent Headlights
|