Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >>

Details

Records
Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ YWW2022b			Serial	3815
Permanent link to this record



Author	Andres Mafla
Title	Leveraging Scene Text Information for Image Interpretation			Type	Book Whole
Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Until recently, most computer vision models remained illiterate, largely ignoring the semantically rich and explicit information contained in scene text. Recent progress in scene text detection and recognition has recently allowed exploring its role in a diverse set of open computer vision problems, e.g. image classification, image-text retrieval, image captioning, and visual question answering to name a few. The explicit semantics of scene text closely requires specific modeling similar to language. However, scene text is a particular signal that has to be interpreted according to a comprehensive perspective that encapsulates all the visual cues in an image. Incorporating this information is a straightforward task for humans, but if we are unfamiliar with a language or scripture, achieving a complete world understanding is impossible (e.a. visiting a foreign country with a different alphabet). Despite the importance of scene text, modeling it requires considering the several ways in which scene text interacts with an image, processing and fusing an additional modality. In this thesis, we mainly focus on two tasks, scene text-based fine-grained image classification, and cross-modal retrieval. In both studied tasks we identify existing limitations in current approaches and propose plausible solutions. Concretely, in each chapter: i) We define a compact way to embed scene text that generalizes to unseen words at training time while performing in real-time. ii) We incorporate the previously learned scene text embedding to create an image-level descriptor that overcomes optical character recognition (OCR) errors which is well-suited to the fine-grained image classification task. iii) We design a region-level reasoning network that learns the interaction through semantics among salient visual regions and scene text instances. iv) We employ scene text information in image-text matching and introduce the Scene Text Aware Cross-Modal retrieval StacMR task. We gather a dataset that incorporates scene text and design a model suited for the newly studied modality. v) We identify the drawbacks of current retrieval metrics in cross-modal retrieval. An image captioning metric is proposed as a way of better evaluating semantics in retrieved results. Ample experimentation shows that incorporating such semantics into a model yields better semantic results while requiring significantly less data to converge.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-6-2	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Maf2022			Serial	3756
Permanent link to this record



Author	Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas
Title	Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning			Type	Conference Article
Year	2022	Publication	Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	1381-1390
Keywords	Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data
Abstract	Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online.
Address	Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.155; 302.105			Approved	no
Call Number	Admin @ si @ BGK2022			Serial	3662
Permanent link to this record



Author	Javier Rodenas; Bhalaji Nagarajan; Marc Bolaños; Petia Radeva
Title	Learning Multi-Subset of Classes for Fine-Grained Food Recognition			Type	Conference Article
Year	2022	Publication	7th International Workshop on Multimedia Assisted Dietary Management	Abbreviated Journal
Volume		Issue		Pages	17–26
Keywords
Abstract	Food image recognition is a complex computer vision task, because of the large number of fine-grained food classes. Fine-grained recognition tasks focus on learning subtle discriminative details to distinguish similar classes. In this paper, we introduce a new method to improve the classification of classes that are more difficult to discriminate based on Multi-Subsets learning. Using a pre-trained network, we organize classes in multiple subsets using a clustering technique. Later, we embed these subsets in a multi-head model structure. This structure has three distinguishable parts. First, we use several shared blocks to learn the generalized representation of the data. Second, we use multiple specialized blocks focusing on specific subsets that are difficult to distinguish. Lastly, we use a fully connected layer to weight the different subsets in an end-to-end manner by combining the neuron outputs. We validated our proposed method using two recent state-of-the-art vision transformers on three public food recognition datasets. Our method was successful in learning the confused classes better and we outperformed the state-of-the-art on the three datasets.
Address	Lisboa; Portugal; October 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	MADiMa
Notes	MILAB			Approved	no
Call Number	Admin @ si @ RNB2022			Serial	3797
Permanent link to this record



Author	Xavier Soria; Gonzalo Pomboza-Junez; Angel Sappa
Title	LDC: Lightweight Dense CNN for Edge Detection			Type	Journal Article
Year	2022	Publication	IEEE Access	Abbreviated Journal	ACCESS
Volume	10	Issue		Pages	68281-68290
Keywords
Abstract	This paper presents a Lightweight Dense Convolutional (LDC) neural network for edge detection. The proposed model is an adaptation of two state-of-the-art approaches, but it requires less than 4% of parameters in comparison with these approaches. The proposed architecture generates thin edge maps and reaches the highest score (i.e., ODS) when compared with lightweight models (models with less than 1 million parameters), and reaches a similar performance when compare with heavy architectures (models with about 35 million parameters). Both quantitative and qualitative results and comparisons with state-of-the-art models, using different edge detection datasets, are provided. The proposed LDC does not use pre-trained weights and requires straightforward hyper-parameter settings. The source code is released at https://github.com/xavysp/LDC
Address	27 June 2022
Corporate Author				Thesis
Publisher	IEEE	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU; MACO; 600.160; 600.167			Approved	no
Call Number	Admin @ si @ SPS2022			Serial	3751
Permanent link to this record



Author	Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
Title	Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching			Type	Conference Article
Year	2022	Publication	Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	1391-1400
Keywords	Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning
Abstract	The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.
Address	Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.155; 302.105;			Approved	no
Call Number	Admin @ si @ BMG2022			Serial	3663
Permanent link to this record



Author	Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Federica Cruciani; Lorenza Brusini; Petia Radeva
Title	Investigating Explainable Artificial Intelligence for MRI-based Classification of Dementia: a New Stability Criterion for Explainable Methods			Type	Conference Article
Year	2022	Publication	29th IEEE International Conference on Image Processing	Abbreviated Journal
Volume		Issue		Pages
Keywords	Image processing; Stability criteria; Machine learning; Robustness; Alzheimer's disease; Monitoring
Abstract	Individuals diagnosed with Mild Cognitive Impairment (MCI) have shown an increased risk of developing Alzheimer’s Disease (AD). As such, early identification of dementia represents a key prognostic element, though hampered by complex disease patterns. Increasing efforts have focused on Machine Learning (ML) to build accurate classification models relying on a multitude of clinical/imaging variables. However, ML itself does not provide sensible explanations related to the model mechanism and feature contribution. Explainable Artificial Intelligence (XAI) represents the enabling technology in this framework, allowing to understand ML outcomes and derive human-understandable explanations. In this study, we aimed at exploring ML combined with MRI-based features and XAI to solve this classification problem and interpret the outcome. In particular, we propose a new method to assess the robustness of feature rankings provided by XAI methods, especially when multicollinearity exists. Our findings indicate that our method was able to disentangle the list of the informative features underlying dementia, with important implications for aiding personalized monitoring plans.
Address	Bordeaux; France; October 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICIP
Notes	MILAB			Approved	no
Call Number	Admin @ si @ SBC2022			Serial	3789
Permanent link to this record



Author	Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva
Title	Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty			Type	Conference Article
Year	2022	Publication	IEEE International Conference on Signal Processing and Communications	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.
Address	Bangalore; India; July 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	SPCOM
Notes	MILAB; no menciona			Approved	no
Call Number	Admin @ si @ PMD2022			Serial	3796
Permanent link to this record



Author	Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar
Title	InfographicVQA			Type	Conference Article
Year	2022	Publication	Winter Conference on Applications of Computer Vision	Abbreviated Journal
Volume		Issue		Pages	1697-1706
Keywords	Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages
Abstract	Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org
Address	Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WACV
Notes	DAG; 600.155			Approved	no
Call Number	MBT2022			Serial	3625
Permanent link to this record



Author	Carles Onielfa; Carles Casacuberta; Sergio Escalera
Title	Influence in Social Networks Through Visual Analysis of Image Memes			Type	Conference Article
Year	2022	Publication	Artificial Intelligence Research and Development	Abbreviated Journal
Volume	356	Issue		Pages	71-80
Keywords
Abstract	Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; no menciona			Approved	no
Call Number	Admin @ si @ OCE2022			Serial	3799
Permanent link to this record



Author	Kai Wang; Xialei Liu; Andrew Bagdanov; Luis Herranz; Shangling Jui; Joost Van de Weijer
Title	Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition			Type	Conference Article
Year	2022	Publication	CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition)	Abbreviated Journal
Volume		Issue		Pages	3728-3738
Keywords	Training; Computer vision; Image recognition; Upper bound; Conferences; Pattern recognition; Task analysis
Abstract	In this paper we consider the problem of incremental meta-learning in which classes are presented incrementally in discrete tasks. We propose Episodic Replay Distillation (ERD), that mixes classes from the current task with exemplars from previous tasks when sampling episodes for meta-learning. To allow the training to benefit from a large as possible variety of classes, which leads to more gener- alizable feature representations, we propose the cross-task meta loss. Furthermore, we propose episodic replay distillation that also exploits exemplars for improved knowledge distillation. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence scenarios, we reduce the gap between Incremental Meta-Learning and the joint-training upper bound from 3.5% / 10.1% / 13.4% / 11.7% with the current state-of-the-art to 2.6% / 2.9% / 5.0% / 0.2% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100 / CUB, respectively.
Address	New Orleans, USA; 20 June 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ WLB2022			Serial	3686
Permanent link to this record



Author	Ana Garcia Rodriguez; Yael Tudela; Henry Cordova; S. Carballal; I. Ordas; L. Moreira; E. Vaquero; O. Ortiz; L. Rivero; F. Javier Sanchez; Miriam Cuatrecasas; Maria Pellise; Jorge Bernal; Gloria Fernandez Esparrach
Title	In vivo computer-aided diagnosis of colorectal polyps using white light endoscopy			Type	Journal Article
Year	2022	Publication	Endoscopy International Open	Abbreviated Journal	ENDIO
Volume	10	Issue	9	Pages	E1201-E1207
Keywords
Abstract	Background and study aims Artificial intelligence is currently able to accurately predict the histology of colorectal polyps. However, systems developed to date use complex optical technologies and have not been tested in vivo. The objective of this study was to evaluate the efficacy of a new deep learning-based optical diagnosis system, ATENEA, in a real clinical setting using only high-definition white light endoscopy (WLE) and to compare its performance with endoscopists. Methods ATENEA was prospectively tested in real life on consecutive polyps detected in colorectal cancer screening colonoscopies at Hospital Clínic. No images were discarded, and only WLE was used. The in vivo ATENEA's prediction (adenoma vs non-adenoma) was compared with the prediction of four staff endoscopists without specific training in optical diagnosis for the study purposes. Endoscopists were blind to the ATENEA output. Histology was the gold standard. Results Ninety polyps (median size: 5 mm, range: 2-25) from 31 patients were included of which 69 (76.7 %) were adenomas. ATENEA correctly predicted the histology in 63 of 69 (91.3 %, 95 % CI: 82 %-97 %) adenomas and 12 of 21 (57.1 %, 95 % CI: 34 %-78 %) non-adenomas while endoscopists made correct predictions in 52 of 69 (75.4 %, 95 % CI: 60 %-85 %) and 20 of 21 (95.2 %, 95 % CI: 76 %-100 %), respectively. The global accuracy was 83.3 % (95 % CI: 74%-90 %) and 80 % (95 % CI: 70 %-88 %) for ATENEA and endoscopists, respectively. Conclusion ATENEA can accurately be used for in vivo characterization of colorectal polyps, enabling the endoscopist to make direct decisions. ATENEA showed a global accuracy similar to that of endoscopists despite an unsatisfactory performance for non-adenomatous lesions.
Address	2022 Sep 14
Corporate Author				Thesis
Publisher	PMID	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.157			Approved	no
Call Number	Admin @ si @ GTC2022b			Serial	3752
Permanent link to this record



Author	Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang
Title	Improving Handwritten Music Recognition through Language Model Integration			Type	Conference Article
Year	2022	Publication	4th International Workshop on Reading Music Systems (WoRMS2022)	Abbreviated Journal
Volume		Issue		Pages	42-46
Keywords	optical music recognition; historical sources; diversity; music theory; digital humanities
Abstract	Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature.
Address	November 18, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WoRMS
Notes	DAG; 600.121; 600.162; 602.230			Approved	no
Call Number	Admin @ si @ TBF2022			Serial	3735
Permanent link to this record



Author	Yecong Wan; Yuanshuo Cheng; Miingwen Shao; Jordi Gonzalez
Title	Image rain removal and illumination enhancement done in one go			Type	Journal Article
Year	2022	Publication	Knowledge-Based Systems	Abbreviated Journal	KBS
Volume	252	Issue		Pages	109244
Keywords
Abstract	Rain removal plays an important role in the restoration of degraded images. Recently, CNN-based methods have achieved remarkable success. However, these approaches neglect that the appearance of real-world rain is often accompanied by low light conditions, which will further degrade the image quality, thereby hindering the restoration mission. Therefore, it is very indispensable to jointly remove the rain and enhance illumination for real-world rain image restoration. To this end, we proposed a novel spatially-adaptive network, dubbed SANet, which can remove the rain and enhance illumination in one go with the guidance of degradation mask. Meanwhile, to fully utilize negative samples, a contrastive loss is proposed to preserve more natural textures and consistent illumination. In addition, we present a new synthetic dataset, named DarkRain, to boost the development of rain image restoration algorithms in practical scenarios. DarkRain not only contains different degrees of rain, but also considers different lighting conditions, and more realistically simulates real-world rainfall scenarios. SANet is extensively evaluated on the proposed dataset and attains new state-of-the-art performance against other combining methods. Moreover, after a simple transformation, our SANet surpasses existing the state-of-the-art algorithms in both rain removal and low-light image enhancement.
Address	Sept 2022
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.157; 600.168			Approved	no
Call Number	Admin @ si @ WCS2022			Serial	3744
Permanent link to this record



Author	Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio
Title	Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics			Type	Journal Article
Year	2022	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
Volume	31	Issue		Pages	5163 - 5177
Keywords	Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements
Abstract	In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	600.161; 611.007			Approved	no
Call Number	Admin @ si @ SVG2022			Serial	3683
Permanent link to this record