toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Jose Elias Yauri; M. Lagos; H. Vega-Huerta; P. de-la-Cruz; G.L.E Maquen-Niño; E. Condor-Tinoco edit  doi
openurl 
  Title Detection of Epileptic Seizures Based-on Channel Fusion and Transformer Network in EEG Recordings Type Journal Article
  Year 2023 Publication International Journal of Advanced Computer Science and Applications Abbreviated Journal IJACSA  
  Volume 14 Issue 5 Pages 1067-1074  
  Keywords Epilepsy; epilepsy detection; EEG; EEG channel fusion; convolutional neural network; self-attention  
  Abstract According to the World Health Organization, epilepsy affects more than 50 million people in the world, and specifically, 80% of them live in developing countries. Therefore, epilepsy has become among the major public issue for many governments and deserves to be engaged. Epilepsy is characterized by uncontrollable seizures in the subject due to a sudden abnormal functionality of the brain. Recurrence of epilepsy attacks change people’s lives and interferes with their daily activities. Although epilepsy has no cure, it could be mitigated with an appropriated diagnosis and medication. Usually, epilepsy diagnosis is based on the analysis of an electroencephalogram (EEG) of the patient. However, the process of searching for seizure patterns in a multichannel EEG recording is a visual demanding and time consuming task, even for experienced neurologists. Despite the recent progress in automatic recognition of epilepsy, the multichannel nature of EEG recordings still challenges current methods. In this work, a new method to detect epilepsy in multichannel EEG recordings is proposed. First, the method uses convolutions to perform channel fusion, and next, a self-attention network extracts temporal features to classify between interictal and ictal epilepsy states. The method was validated in the public CHB-MIT dataset using the k-fold cross-validation and achieved 99.74% of specificity and 99.15% of sensitivity, surpassing current approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number Admin @ si @ Serial 3856  
Permanent link to this record
 

 
Author M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat edit  doi
openurl 
  Title Implicit Learning of Scene Geometry From Poses for Global Localization Type Journal Article
  Year 2024 Publication IEEE Robotics and Automation Letters Abbreviated Journal ROBOTAUTOMLET  
  Volume 9 Issue 2 Pages 955-962  
  Keywords Localization; Localization and mapping; Deep learning for visual perception; Visual learning  
  Abstract Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2377-3766 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Serial 3857  
Permanent link to this record
 

 
Author P. Canals; Simone Balocco; O. Diaz; J. Li; A. Garcia Tornel; M. Olive Gadea; M. Ribo edit  url
doi  openurl
  Title A fully automatic method for vascular tortuosity feature extraction in the supra-aortic region: unraveling possibilities in stroke treatment planning Type Journal Article
  Year 2023 Publication Computerized Medical Imaging and Graphics Abbreviated Journal CMIG  
  Volume 104 Issue 102170 Pages  
  Keywords Artificial intelligence; Deep learning; Stroke; Thrombectomy; Vascular feature extraction; Vascular tortuosity  
  Abstract Vascular tortuosity of supra-aortic vessels is widely considered one of the main reasons for failure and delays in endovascular treatment of large vessel occlusion in patients with acute ischemic stroke. Characterization of tortuosity is a challenging task due to the lack of objective, robust and effective analysis tools. We present a fully automatic method for arterial segmentation, vessel labelling and tortuosity feature extraction applied to the supra-aortic region. A sample of 566 computed tomography angiography scans from acute ischemic stroke patients (aged 74.8 ± 12.9, 51.0% females) were used for training, validation and testing of a segmentation module based on a U-Net architecture (162 cases) and a vessel labelling module powered by a graph U-Net (566 cases). Successively, 30 cases were processed for testing of a tortuosity feature extraction module. Measurements obtained through automatic processing were compared to manual annotations from two observers for a thorough validation of the method. The proposed feature extraction method presented similar performance to the inter-rater variability observed in the measurement of 33 geometrical and morphological features of the arterial anatomy in the supra-aortic region. This system will contribute to the development of more complex models to advance the treatment of stroke by adding immediate automation, objectivity, repeatability and robustness to the vascular tortuosity characterization of patients.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ CBD2023 Serial 4005  
Permanent link to this record
 

 
Author Francesc Net; Marc Folia; Pep Casals; Lluis Gomez edit  url
openurl 
  Title Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14191 Issue Pages 3-17  
  Keywords Image deduplication; Near-duplicate images detection; Transductive Learning; Photographic Archives; Deep Learning  
  Abstract This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ NFC2023 Serial 3859  
Permanent link to this record
 

 
Author Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages 1940-1948  
  Keywords  
  Abstract Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.  
  Address Washington; USA; February 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ NBM2023 Serial 3860  
Permanent link to this record
 

 
Author Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez edit  url
openurl 
  Title Single image super-resolution based on directional variance attention network Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 133 Issue Pages 108997  
  Keywords  
  Abstract Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ BPF2023 Serial 3861  
Permanent link to this record
 

 
Author Wenjuan Gong; Yue Zhang; Wei Wang; Peng Cheng; Jordi Gonzalez edit  url
openurl 
  Title Meta-MMFNet: Meta-learning-based Multi-model Fusion Network for Micro-expression Recognition Type Journal Article
  Year 2023 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal TMCCA  
  Volume 20 Issue 2 Pages 1–20  
  Keywords  
  Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning-based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ GZW2023 Serial 3862  
Permanent link to this record
 

 
Author Bonifaz Stuhr; Jurgen Brauer; Bernhard Schick; Jordi Gonzalez edit   pdf
url  openurl
  Title Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract A common goal of unpaired image-to-image translation is to preserve content consistency between source images and translated images while mimicking the style of the target domain. Due to biases between the datasets of both domains, many methods suffer from inconsistencies caused by the translation process. Most approaches introduced to mitigate these inconsistencies do not constrain the discriminator, leading to an even more ill-posed training setup. Moreover, none of these approaches is designed for larger crop sizes. In this work, we show that masking the inputs of a global discriminator for both domains with a content-based mask is sufficient to reduce content inconsistencies significantly. However, this strategy leads to artifacts that can be traced back to the masking process. To reduce these artifacts, we introduce a local discriminator that operates on pairs of small crops selected with a similarity sampling strategy. Furthermore, we apply this sampling strategy to sample global input crops from the source and target dataset. In addition, we propose feature-attentive denormalization to selectively incorporate content-based statistics into the generator stream. In our experiments, we show that our method achieves state-of-the-art performance in photorealistic sim-to-real translation and weather translation and also performs well in day-to-night translation. Additionally, we propose the cKVD metric, which builds on the sKVD metric and enables the examination of translation quality at the class or category level.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ SBS2023 Serial 3863  
Permanent link to this record
 

 
Author Wenwen Fu; Zhihong An; Wendong Huang; Haoran Sun; Wenjuan Gong; Jordi Gonzalez edit  url
openurl 
  Title A Spatio-Temporal Spotting Network with Sliding Windows for Micro-Expression Detection Type Journal Article
  Year 2023 Publication Electronics Abbreviated Journal ELEC  
  Volume 12 Issue 18 Pages 3947  
  Keywords micro-expression spotting; sliding window; key frame extraction  
  Abstract Micro-expressions reveal underlying emotions and are widely applied in political psychology, lie detection, law enforcement and medical care. Micro-expression spotting aims to detect the temporal locations of facial expressions from video sequences and is a crucial task in micro-expression recognition. In this study, the problem of micro-expression spotting is formulated as micro-expression classification per frame. We propose an effective spotting model with sliding windows called the spatio-temporal spotting network. The method involves a sliding window detection mechanism, combines the spatial features from the local key frames and the global temporal features and performs micro-expression spotting. The experiments are conducted on the CAS(ME)2 database and the SAMM Long Videos database, and the results demonstrate that the proposed method outperforms the state-of-the-art method by 30.58% for the CAS(ME)2 and 23.98% for the SAMM Long Videos according to overall F-scores.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ FAH2023 Serial 3864  
Permanent link to this record
 

 
Author Luca Ginanni Corradini; Simone Balocco; Luciano Maresca; Silvio Vitale; Matteo Stefanini edit  url
doi  openurl
  Title Anatomical Modifications After Stent Implantation: A Comparative Analysis Between CGuard, Wallstent, and Roadsaver Carotid Stents Type Journal Article
  Year 2023 Publication Journal of Endovascular Therapy Abbreviated Journal  
  Volume 30 Issue 1 Pages 18-24  
  Keywords Ginanni Corradini L, Balocco S, Maresca L, Vitale S, Stefanini M.  
  Abstract Abstract
Purpose:
Carotid revascularization can be associated with modifications of the vascular geometry, which may lead to complications. The changes on the vessel angulation before and after a carotid WallStent (WS) implantation are compared against 2 new dual-layer devices, CGuard (CG) and RoadSaver (RS).
Materials and Methods:
The study prospectively recruited 217 consecutive patients (112 GC, 73 WS, and 32 RS, respectively). Angiography projections were explored and the one having a higher arterial angle was selected as a basal view. After stent implantation, a stent control angiography was performed selecting the projection having the maximal angle. The same procedure is followed in all the 3 stent types to guarantee comparable conditions. The angulation changes on the stented segments were quantified from both angiographies. The statistical analysis quantitatively compared the pre-and post-angles for the 3 stent types. The results are qualitatively illustrated using boxplots. Finally, the relation between pre- and post-angles measurements is analyzed using linear regression.
Results:
For CG, no statistical difference in the axial vessel geometry between the basal and postprocedural angles was found. For WS and RS, statistical difference was found between pre- and post-angles. The regression analysis shows that CG induces lower changes from the original curvature with respect to WS and RS.
Conclusion:
Based on our results, CG determines minor changes over the basal morphology than WS and RS stents. Hence, CG respects better the native vessel anatomy than the other stents.
Level of Evidence: Level 4, Case Series.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes xxx Approved no  
  Call Number Admin @ si @ GBM2023 Serial 4006  
Permanent link to this record
 

 
Author Maciej Wielgosz; Antonio Lopez; Muhamad Naveed Riaz edit   pdf
url  openurl
  Title CARLA-BSP: a simulated dataset with pedestrians Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We present a sample dataset featuring pedestrians generated using the ARCANE framework, a new framework for generating datasets in CARLA (0.9.13). We provide use cases for pedestrian detection, autoencoding, pose estimation, and pose lifting. We also showcase baseline results.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ WLN2023 Serial 3866  
Permanent link to this record
 

 
Author Akhil Gurram; Antonio Lopez edit   pdf
url  openurl
  Title On the Metrics for Evaluating Monocular Depth Estimation Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Monocular Depth Estimation (MDE) is performed to produce 3D information that can be used in downstream tasks such as those related to on-board perception for Autonomous Vehicles (AVs) or driver assistance. Therefore, a relevant arising question is whether the standard metrics for MDE assessment are a good indicator of the accuracy of future MDE-based driving-related perception tasks. We address this question in this paper. In particular, we take the task of 3D object detection on point clouds as a proxy of on-board perception. We train and test state-of-the-art 3D object detectors using 3D point clouds coming from MDE models. We confront the ranking of object detection results with the ranking given by the depth estimation metrics of the MDE models. We conclude that, indeed, MDE evaluation metrics give rise to a ranking of methods that reflects relatively well the 3D object detection results we may expect. Among the different metrics, the absolute relative (abs-rel) error seems to be the best for that purpose.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ GuL2023 Serial 3867  
Permanent link to this record
 

 
Author David Pujol Perich; Albert Clapes; Sergio Escalera edit   pdf
url  openurl
  Title SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications. These scenarios, despite realistic, are often neglected in the literature, exposing these solutions to important performance degradation. In this work, we tackle this issue by introducing, for the first time, an approach for Unsupervised Domain Adaptation (UDA) in sparse TAL, which we refer to as Semantic Adversarial unsupervised Domain Adaptation (SADA). Our contributions are threefold: (1) we pioneer the development of a domain adaptation model that operates on realistic sparse action detection benchmarks; (2) we tackle the limitations of global-distribution alignment techniques by introducing a novel adversarial loss that is sensitive to local class distributions, ensuring finer-grained adaptation; and (3) we present a novel set of benchmarks based on EpicKitchens100 and CharadesEgo, that evaluate multiple domain shifts in a comprehensive manner. Our experiments indicate that SADA improves the adaptation across domains when compared to fully supervised state-of-the-art and alternative UDA methods, attaining a performance boost of up to 6.14% mAP.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ PCE2023 Serial 4014  
Permanent link to this record
 

 
Author Hao Fang; Ajian Liu; Jun Wan; Sergio Escalera; Chenxu Zhao; Xu Zhang; Stan Z Li; Zhen Lei edit   pdf
url  openurl
  Title Surveillance Face Anti-spoofing Type Journal Article
  Year 2024 Publication IEEE Transactions on Information Forensics and Security Abbreviated Journal TIFS  
  Volume 19 Issue Pages 1535-1546  
  Keywords  
  Abstract Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ FLW2024 Serial 3869  
Permanent link to this record
 

 
Author Senmao Li; Joost van de Weijer; Taihang Hu; Fahad Shahbaz Khan; Qibin Hou; Yaxing Wang; Jian Yang edit   pdf
url  openurl
  Title StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language (up) Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ LWH2023 Serial 3870  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: