Home | << 1 2 3 4 5 6 7 8 9 10 >> |
Records | |||||
---|---|---|---|---|---|
Author | Ana Garcia Rodriguez; Yael Tudela; Henry Cordova; S. Carballal; I. Ordas; L. Moreira; E. Vaquero; O. Ortiz; L. Rivero; F. Javier Sanchez; Miriam Cuatrecasas; Maria Pellise; Jorge Bernal; Gloria Fernandez Esparrach | ||||
Title | First in Vivo Computer-Aided Diagnosis of Colorectal Polyps using White Light Endoscopy | Type | Journal Article | ||
Year | 2022 | Publication | Endoscopy | Abbreviated Journal | END |
Volume | 54 | Issue | Pages | ||
Keywords | |||||
Abstract | |||||
Address | 2022/04/14 | ||||
Corporate Author | Thesis | ||||
Publisher | Georg Thieme Verlag KG | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ GTC2022a | Serial | 3746 | ||
Permanent link to this record | |||||
Author | Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas | ||||
Title | Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition | Type | Journal Article | ||
Year | 2022 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 129 | Issue | Pages | 108766 | |
Keywords | |||||
Abstract | The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios. | ||||
Address | Sept. 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.162 | Approved | no | ||
Call Number | Admin @ si @ KRR2022 | Serial | 3556 | ||
Permanent link to this record | |||||
Author | Diego Velazquez; Pau Rodriguez; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez | ||||
Title | A Closer Look at Embedding Propagation for Manifold Smoothing | Type | Journal Article | ||
Year | 2022 | Publication | Journal of Machine Learning Research | Abbreviated Journal | JMLR |
Volume | 23 | Issue | 252 | Pages | 1-27 |
Keywords | Regularization; emi-supervised learning; self-supervised learning; adversarial robustness; few-shot classification | ||||
Abstract | Supervised training of neural networks requires a large amount of manually annotated data and the resulting networks tend to be sensitive to out-of-distribution (OOD) data.
Self- and semi-supervised training schemes reduce the amount of annotated data required during the training process. However, OOD generalization remains a major challenge for most methods. Strategies that promote smoother decision boundaries play an important role in out-of-distribution generalization. For example, embedding propagation (EP) for manifold smoothing has recently shown to considerably improve the OOD performance for few-shot classification. EP achieves smoother class manifolds by building a graph from sample embeddings and propagating information through the nodes in an unsupervised manner. In this work, we extend the original EP paper providing additional evidence and experiments showing that it attains smoother class embedding manifolds and improves results in settings beyond few-shot classification. Concretely, we show that EP improves the robustness of neural networks against multiple adversarial attacks as well as semi- and self-supervised learning performance. |
||||
Address | 9/2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ VRG2022 | Serial | 3762 | ||
Permanent link to this record | |||||
Author | S.K. Jemni; Mohamed Ali Souibgui; Yousri Kessentini; Alicia Fornes | ||||
Title | Enhance to Read Better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement | Type | Journal Article | ||
Year | 2022 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 123 | Issue | Pages | 108370 | |
Keywords | |||||
Abstract | Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover the degraded documents into a and form. Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable. To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents. Extensive experiments conducted on degraded Arabic and Latin handwritten documents demonstrate the usefulness of integrating the recognizer within the GAN architecture, which improves both the visual quality and the readability of the degraded document images. Moreover, we outperform the state of the art in H-DIBCO challenges, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images, on this task. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.124; 600.121; 602.230 | Approved | no | ||
Call Number | Admin @ si @ JSK2022 | Serial | 3613 | ||
Permanent link to this record | |||||
Author | Mohamed Ali Souibgui; Ali Furkan Biten; Sounak Dey; Alicia Fornes; Yousri Kessentini; Lluis Gomez; Dimosthenis Karatzas; Josep Llados | ||||
Title | One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Document Analysis | ||||
Abstract | Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data. | ||||
Address | Virtual; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 602.230; 600.140 | Approved | no | ||
Call Number | Admin @ si @ SBD2022 | Serial | 3615 | ||
Permanent link to this record | |||||
Author | Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar | ||||
Title | InfographicVQA | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1697-1706 | ||
Keywords | Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages | ||||
Abstract | Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org | ||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155 | Approved | no | ||
Call Number | MBT2022 | Serial | 3625 | ||
Permanent link to this record | |||||
Author | Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund | ||||
Title | Multi-Task Classification of Sewer Pipe Defects and Properties Using a Cross-Task Graph Neural Network Decoder | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 2806-2817 | ||
Keywords | Vision Systems; Applications Multi-Task Classification | ||||
Abstract | The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer
defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and pipe shape, which are needed to evaluate the level of sewer pipe deterioration. In this work we classify sewer pipe defects and properties concurrently and present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN), which refines the disjointed per-task predictions using cross-task information. The CT-GNN architecture extends the traditional disjointed task-heads decoder, by utilizing a cross-task graph and unique class node embeddings. The cross-task graph can either be determined a priori based on the conditional probability between the task classes or determined dynamically using self-attention. CT-GNN can be added to any backbone and trained end-toend at a small increase in the parameter count. We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset, improving defect classification and water level classification by 5.3 and 8.0 percentage points, respectively. We also outperform the single task methods as well as other multi-task classification approaches while introducing 50 times fewer parameters than previous modelfocused approaches. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ BME2022 | Serial | 3638 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Sergio Escalera; Xavier Baro; Jordi Gonzalez | ||||
Title | End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data | Type | Journal Article | ||
Year | 2022 | Publication | IET Computer Vision | Abbreviated Journal | IETCV |
Volume | 16 | Issue | 1 | Pages | 50-66 |
Keywords | Computer vision; data acquisition; human computer interaction; learning (artificial intelligence); pose estimation | ||||
Abstract | Despite recent advances in 3D pose estimation of human hands, especially thanks to the advent of CNNs and depth cameras, this task is still far from being solved. This is mainly due to the highly non-linear dynamics of fingers, which make hand model training a challenging task. In this paper, we exploit a novel hierarchical tree-like structured CNN, in which branches are trained to become specialized in predefined subsets of hand joints, called local poses. We further fuse local pose features, extracted from hierarchical CNN branches, to learn higher order dependencies among joints in the final pose by end-to-end training. Lastly, the loss function used is also defined to incorporate appearance and physical constraints about doable hand motion and deformation. Finally, we introduce a non-rigid data augmentation approach to increase the amount of training depth data. Experimental results suggest that feeding a tree-shaped CNN, specialized in local poses, into a fusion network for modeling joints correlations and dependencies, helps to increase the precision of final estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; ISE; 600.098; 600.119 | Approved | no | ||
Call Number | Admin @ si @ MEB2022 | Serial | 3652 | ||
Permanent link to this record | |||||
Author | Razieh Rastgoo; Kourosh Kiani; Sergio Escalera | ||||
Title | Real-time Isolated Hand Sign Language RecognitioN Using Deep Networks and SVD | Type | Journal | ||
Year | 2022 | Publication | Journal of Ambient Intelligence and Humanized Computing | Abbreviated Journal | |
Volume | 13 | Issue | Pages | 591–611 | |
Keywords | |||||
Abstract | One of the challenges in computer vision models, especially sign language, is real-time recognition. In this work, we present a simple yet low-complex and efficient model, comprising single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and long short term memory, to real-time isolated hand sign language recognition (IHSLR) from RGB video. We employ the SVD method as an efficient, compact, and discriminative feature extractor from the estimated 3D hand keypoints coordinators. Despite the previous works that employ the estimated 3D hand keypoints coordinates as raw features, we propose a novel and revolutionary way to apply the SVD to the estimated 3D hand keypoints coordinates to get more discriminative features. SVD method is also applied to the geometric relations between the consecutive segments of each finger in each hand and also the angles between these sections. We perform a detailed analysis of recognition time and accuracy. One of our contributions is that this is the first time that the SVD method is applied to the hand pose parameters. Results on four datasets, RKS-PERSIANSIGN (99.5±0.04), First-Person (91±0.06), ASVID (93±0.05), and isoGD (86.1±0.04), confirm the efficiency of our method in both accuracy (mean+std) and time recognition. Furthermore, our model outperforms or gets competitive results with the state-of-the-art alternatives in IHSLR and hand action recognition. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ RKE2022a | Serial | 3660 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1381-1390 | ||
Keywords | Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data | ||||
Abstract | Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online. |
||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105 | Approved | no | ||
Call Number | Admin @ si @ BGK2022 | Serial | 3662 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching | Type | Conference Article | ||
Year | 2022 | Publication | Winter Conference on Applications of Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 1391-1400 | ||
Keywords | Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning | ||||
Abstract | The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin. | ||||
Address | Virtual; Waikoloa; Hawai; USA; January 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WACV | ||
Notes | DAG; 600.155; 302.105; | Approved | no | ||
Call Number | Admin @ si @ BMG2022 | Serial | 3663 | ||
Permanent link to this record | |||||
Author | Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal | ||||
Title | DocEnTr: An End-to-End Document Image Enhancement Transformer | Type | Conference Article | ||
Year | 2022 | Publication | 26th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1699-1705 | ||
Keywords | Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads | ||||
Abstract | Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR | ||||
Address | August 21-25, 2022 , Montréal Québec | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | DAG; 600.121; 600.162; 602.230; 600.140 | Approved | no | ||
Call Number | Admin @ si @ SBJ2022 | Serial | 3730 | ||
Permanent link to this record | |||||
Author | Fei Yang; Yaxing Wang; Luis Herranz; Yongmei Cheng; Mikhail Mozerov | ||||
Title | A Novel Framework for Image-to-image Translation and Image Compression | Type | Journal Article | ||
Year | 2022 | Publication | Neurocomputing | Abbreviated Journal | NEUCOM |
Volume | 508 | Issue | Pages | 58-70 | |
Keywords | |||||
Abstract | Data-driven paradigms using machine learning are becoming ubiquitous in image processing and communications. In particular, image-to-image (I2I) translation is a generic and widely used approach to image processing problems, such as image synthesis, style transfer, and image restoration. At the same time, neural image compression has emerged as a data-driven alternative to traditional coding approaches in visual communications. In this paper, we study the combination of these two paradigms into a joint I2I compression and translation framework, focusing on multi-domain image synthesis. We first propose distributed I2I translation by integrating quantization and entropy coding into an I2I translation framework (i.e. I2Icodec). In practice, the image compression functionality (i.e. autoencoding) is also desirable, requiring to deploy alongside I2Icodec a regular image codec. Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec. Adaptive residual blocks conditioned on the translation/compression mode provide flexible adaptation to the desired functionality. The experiments show promising results in both I2I translation and image compression using a single model. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ YWH2022 | Serial | 3679 | ||
Permanent link to this record | |||||
Author | Alex Gomez-Villa; Adrian Martin; Javier Vazquez; Marcelo Bertalmio; Jesus Malo | ||||
Title | On the synthesis of visual illusions using deep generative models | Type | Journal Article | ||
Year | 2022 | Publication | Journal of Vision | Abbreviated Journal | JOV |
Volume | 22(8) | Issue | 2 | Pages | 1-18 |
Keywords | |||||
Abstract | Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.161; 611.007 | Approved | no | ||
Call Number | Admin @ si @ GMV2022 | Serial | 3682 | ||
Permanent link to this record | |||||
Author | Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio | ||||
Title | Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics | Type | Journal Article | ||
Year | 2022 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 31 | Issue | Pages | 5163 - 5177 | |
Keywords | Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements | ||||
Abstract | In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | 600.161; 611.007 | Approved | no | ||
Call Number | Admin @ si @ SVG2022 | Serial | 3683 | ||
Permanent link to this record |