|
Records |
Links |
|
Author |
Marco Cotogni; Fei Yang; Claudio Cusano; Andrew Bagdanov; Joost Van de Weijer |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Exemplar-free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
ARXIV |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ CYC2023 |
Serial |
3981 |
|
Permanent link to this record |
|
|
|
|
Author |
Anders Skaarup Johansen; Kamal Nasrollahi; Sergio Escalera; Thomas B. Moeslund |
![goto web page url](img/www.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Who Cares about the Weather? Inferring Weather Conditions for Weather-Aware Object Detection in Thermal Images |
Type |
Journal Article |
|
Year |
2023 |
Publication |
Applied Sciences |
Abbreviated Journal |
AS |
|
|
Volume |
13 |
Issue |
18 |
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
thermal; object detection; concept drift; conditioning; weather recognition |
|
|
Abstract |
Deployments of real-world object detection systems often experience a degradation in performance over time due to concept drift. Systems that leverage thermal cameras are especially susceptible because the respective thermal signatures of objects and their surroundings are highly sensitive to environmental changes. In this study, two types of weather-aware latent conditioning methods are investigated. The proposed method aims to guide two object detectors, (YOLOv5 and Deformable DETR) to become weather-aware. This is achieved by leveraging an auxiliary branch that predicts weather-related information while conditioning intermediate layers of the object detector. While the conditioning methods proposed do not directly improve the accuracy of baseline detectors, it can be observed that conditioned networks manage to extract a weather-related signal from the thermal images, thus resulting in a decreased miss rate at the cost of increased false positives. The extracted signal appears noisy and is thus challenging to regress accurately. This is most likely a result of the qualitative nature of the thermal sensor; thus, further work is needed to identify an ideal method for optimizing the conditioning branch, as well as to further improve the accuracy of the system. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ SNE2023 |
Serial |
3983 |
|
Permanent link to this record |
|
|
|
|
Author |
Lei Kang; Lichao Zhang; Dazhi Jiang |
![goto web page url](img/www.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Learning Robust Self-Attention Features for Speech Emotion Recognition with Label-Adaptive Mixup |
Type |
Conference Article |
|
Year |
2023 |
Publication |
IEEE International Conference on Acoustics, Speech and Signal Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions. Despite the recent progress in SER, state-of-the-art models struggle to achieve a satisfactory performance. We propose a self-attention based method with combined use of label-adaptive mixup and center loss. By adapting label probabilities in mixup and fitting center loss to the mixup training scheme, our proposed method achieves a superior performance to the state-of-the-art methods. |
|
|
Address |
Rodhes Islands; Greece; June 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICASSP |
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ KZJ2023 |
Serial |
3984 |
|
Permanent link to this record |
|
|
|
|
Author |
Daniel Marczak; Grzegorz Rypesc; Sebastian Cygert; Tomasz Trzcinski; Bartłomiej Twardowski |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Generalized Continual Category Discovery |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
Most of Continual Learning (CL) methods push the limit of supervised learning settings, where an agent is expected to learn new labeled tasks and not forget previous knowledge. However, these settings are not well aligned with real-life scenarios, where a learning agent has access to a vast amount of unlabeled data encompassing both novel (entirely unlabeled) classes and examples from known classes. Drawing inspiration from Generalized Category Discovery (GCD), we introduce a novel framework that relaxes this assumption. Precisely, in any task, we allow for the existence of novel and known classes, and one must use continual version of unsupervised learning methods to discover them. We call this setting Generalized Continual Category Discovery (GCCD). It unifies CL and GCD, bridging the gap between synthetic benchmarks and real-life scenarios. With a series of experiments, we present that existing methods fail to accumulate knowledge from subsequent tasks in which unlabeled samples of novel classes are present. In light of these limitations, we propose a method that incorporates both supervised and unsupervised signals and mitigates the forgetting through the use of centroid adaptation. Our method surpasses strong CL methods adopted for GCD techniques and presents a superior representation learning performance. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ MRC2023 |
Serial |
3985 |
|
Permanent link to this record |
|
|
|
|
Author |
Hao Wu; Alejandro Ariza-Casabona; Bartłomiej Twardowski; Tri Kurniawan Wijaya |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
MM-GEF: Multi-modal representation meet collaborative filtering |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
ARXIV |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
In modern e-commerce, item content features in various modalities offer accurate yet comprehensive information to recommender systems. The majority of previous work either focuses on learning effective item representation during modelling user-item interactions, or exploring item-item relationships by analysing multi-modal features. Those methods, however, fail to incorporate the collaborative item-user-item relationships into the multi-modal feature-based item structure. In this work, we propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion, which effectively combines the latent item structure underlying multi-modal contents with the collaborative signals. Instead of processing the content feature in different modalities separately, we show that the early-fusion of multi-modal features provides significant improvement. MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals. Through extensive experiments on four publicly available datasets, we demonstrate systematical improvements of our method over state-of-the-art multi-modal recommendation methods. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ WAT2023 |
Serial |
3988 |
|
Permanent link to this record |
|
|
|
|
Author |
Anthony Cioppa; Silvio Giancola; Vladimir Somers; Floriane Magera; Xin Zhou; Hassan Mkhallati; Adrien Deliège; Jan Held; Carlos Hinojosa; Amir M. Mansourian; Pierre Miralles; Olivier Barnich; Christophe De Vleeschouwer; Alexandre Alahi; Bernard Ghanem; Marc Van Droogenbroeck; Abdullah Kamal; Adrien Maglo; Albert Clapes; Amr Abdelaziz; Artur Xarles; Astrid Orcesi; Atom Scott; Bin Liu; Byoungkwon Lim; Chen Chen; Fabian Deuser; Feng Yan; Fufu Yu; Gal Shitrit; Guanshuo Wang; Gyusik Choi; Hankyul Kim; Hao Guo; Hasby Fahrudin; Hidenari Koguchi; Håkan Ardo; Ibrahim Salah; Ido Yerushalmy; Iftikar Muhammad; Ikuma Uchida; Ishay Beery; Jaonary Rabarisoa; Jeongae Lee; Jiajun Fu; Jianqin Yin; Jinghang Xu; Jongho Nang; Julien Denize; Junjie Li; Junpei Zhang; Juntae Kim; Kamil Synowiec; Kenji Kobayashi; Kexin Zhang; Konrad Habel; Kota Nakajima; Licheng Jiao; Lin Ma; Lizhi Wang; Luping Wang; Menglong Li; Mengying Zhou; Mohamed Nasr; Mohamed Abdelwahed; Mykola Liashuha; Nikolay Falaleev; Norbert Oswald; Qiong Jia; Quoc-Cuong Pham; Ran Song; Romain Herault; Rui Peng; Ruilong Chen; Ruixuan Liu; Ruslan Baikulov; Ryuto Fukushima; Sergio Escalera; Seungcheon Lee; Shimin Chen; Shouhong Ding; Taiga Someya; Thomas B. Moeslund; Tianjiao Li; Wei Shen; Wei Zhang; Wei Li; Wei Dai; Weixin Luo; Wending Zhao; Wenjie Zhang; Xinquan Yang; Yanbiao Ma; Yeeun Joo; Yingsen Zeng; Yiyang Gan; Yongqiang Zhu; Yujie Zhong; Zheng Ruan; Zhiheng Li; Zhijian Huang; Ziyu Meng |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
SoccerNet 2023 Challenges Results |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on this https URL. Baselines and development kits can be found on this https URL. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ CGS2023 |
Serial |
3991 |
|
Permanent link to this record |
|
|
|
|
Author |
Damian Sojka; Yuyang Liu; Dipam Goswami; Sebastian Cygert; Bartłomiej Twardowski; Joost van de Weijer |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset – SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available.
The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ SLG2023 |
Serial |
3993 |
|
Permanent link to this record |
|
|
|
|
Author |
Mert Kilickaya; Joost van de Weijer; Yuki M. Asano |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Towards Label-Efficient Incremental Learning: A Survey |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
The current dominant paradigm when building a machine learning model is to iterate over a dataset over and over until convergence. Such an approach is non-incremental, as it assumes access to all images of all categories at once. However, for many applications, non-incremental learning is unrealistic. To that end, researchers study incremental learning, where a learner is required to adapt to an incoming stream of data with a varying distribution while preventing forgetting of past knowledge. Significant progress has been made, however, the vast majority of works focus on the fully supervised setting, making these algorithms label-hungry thus limiting their real-life deployment. To that end, in this paper, we make the first attempt to survey recently growing interest in label-efficient incremental learning. We identify three subdivisions, namely semi-, few-shot- and self-supervised learning to reduce labeling efforts. Finally, we identify novel directions that can further enhance label-efficiency and improve incremental learning scalability. Project website: this https URL. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
LAMP |
Approved |
no |
|
|
Call Number |
Admin @ si @ KWA2023 |
Serial |
3994 |
|
Permanent link to this record |
|
|
|
|
Author |
Souhail Bakkali; Sanket Biswas; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades; Josep Llados |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language |
Type |
Miscellaneous |
|
Year |
2023 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ BBM2023 |
Serial |
3995 |
|
Permanent link to this record |
|
|
|
|
Author |
Henry Velesaca; Gisel Bastidas-Guacho; Mohammad Rouhani; Angel Sappa |
![goto web page url](img/www.gif)
|
|
Title |
Multimodal image registration techniques: a comprehensive survey |
Type |
Journal Article |
|
Year |
2024 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
This manuscript presents a review of state-of-the-art techniques proposed in the literature for multimodal image registration, addressing instances where images from different modalities need to be precisely aligned in the same reference system. This scenario arises when the images to be registered come from different modalities, among the visible and thermal spectral bands, 3D-RGB, or flash-no flash, or NIR-visible. The review spans different techniques from classical approaches to more modern ones based on deep learning, aiming to highlight the particularities required at each step in the registration pipeline when dealing with multimodal images. It is noteworthy that medical images are excluded from this review due to their specific characteristics, including the use of both active and passive sensors or the non-rigid nature of the body contained in the image. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MSIAU |
Approved |
no |
|
|
Call Number |
Admin @ si @ VBR2024 |
Serial |
3997 |
|
Permanent link to this record |
|
|
|
|
Author |
Justine Giroux; Mohammad Reza Karimi Dastjerdi; Yannick Hold-Geoffroy; Javier Vazquez; Jean François Lalonde |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Towards a Perceptual Evaluation Framework for Lighting Estimation |
Type |
Conference Article |
|
Year |
2024 |
Publication |
Arxiv |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
rogress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms. |
|
|
Address |
Seattle; USA; June 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPR |
|
|
Notes |
MACO; CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ GDH2024 |
Serial |
3999 |
|
Permanent link to this record |
|
|
|
|
Author |
Trevor Canham; Javier Vazquez; D Long; Richard F. Murray; Michael S Brown |
![download PDF file pdf](img/file_PDF.gif)
|
|
Title |
Noise Prism: A Novel Multispectral Visualization Technique |
Type |
Journal Article |
|
Year |
2021 |
Publication |
31st Color and Imaging Conference |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
A novel technique for visualizing multispectral images is proposed. Inspired by how prisms work, our method spreads spectral information over a chromatic noise pattern. This is accomplished by populating the pattern with pixels representing each measurement band at a count proportional to its measured intensity. The method is advantageous because it allows for lightweight encoding and visualization of spectral information
while maintaining the color appearance of the stimulus. A four alternative forced choice (4AFC) experiment was conducted to validate the method’s information-carrying capacity in displaying metameric stimuli of varying colors and spectral basis functions. The scores ranged from 100% to 20% (less than chance given the 4AFC task), with many conditions falling somewhere in between at statistically significant intervals. Using this data, color and texture difference metrics can be evaluated and optimized to predict the legibility of the visualization technique. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CIC |
|
|
Notes |
MACO; CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ CVL2021 |
Serial |
4000 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras |
![goto web page url](img/www.gif)
|
|
Title |
SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution |
Type |
Conference Article |
|
Year |
2024 |
Publication |
19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM. |
|
|
Address |
Roma; Italia; February 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MSIAU |
Approved |
no |
|
|
Call Number |
Admin @ si @ RBP2024 |
Serial |
4004 |
|
Permanent link to this record |
|
|
|
|
Author |
Mingyi Yang; Fei Yang; Luka Murn; Marc Gorriz Blanch; Juil Sock; Shuai Wan; Fuzheng Yang; Luis Herranz |
![goto web page url](img/www.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks |
Type |
Journal Article |
|
Year |
2024 |
Publication |
IEEE Transactions on Circuits and Systems for Video Technology |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
M Yang, F Yang, L Murn, MG Blanch, J Sock, S Wan, F Yang, L Herranz |
|
|
Abstract |
Visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine tasks, few consider real-world scenarios with multiple tasks. In this paper, we aim to address this gap by proposing a task-switchable pre-processor that optimizes input images specifically for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. The proposed task-switchable pre-processor adeptly maintains relevant semantic information based on the specific characteristics of different downstream tasks, while effectively suppressing irrelevant information to reduce bitrate. To enhance the processing of semantic information for diverse tasks, we leverage pre-extracted semantic features to modulate the pixel-to-pixel mapping within the pre-processor. By switching between different modulations, multiple tasks can be seamlessly incorporated into the system. Extensive experiments demonstrate the practicality and simplicity of our approach. It significantly reduces the number of parameters required for handling multiple tasks while still delivering impressive performance. Our method showcases the potential to achieve efficient and effective compression for machine vision tasks, supporting the evolving demands of real-world applications. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
xxx |
Approved |
no |
|
|
Call Number |
Admin @ si @ YYM2024 |
Serial |
4007 |
|
Permanent link to this record |
|
|
|
|
Author |
Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa |
![goto web page url](img/www.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach |
Type |
Conference Article |
|
Year |
2023 |
Publication |
17th International Conference on Signal-Image Technology & Internet-Based Systems |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages ![sorted by First Page field, descending order (down)](img/sort_desc.gif) |
|
|
|
Keywords |
|
|
|
Abstract |
This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SITIS |
|
|
Notes |
MSIAU |
Approved |
no |
|
|
Call Number |
Admin @ si @ RVS2023 |
Serial |
4010 |
|
Permanent link to this record |