|   | 
Details
   web
Records
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca
Title (up) Human Body Pose Estimation in Multi-view Environments Type Book Chapter
Year 2022 Publication ICT Applications for Smart Cities. Intelligent Systems Reference Library Abbreviated Journal
Volume 224 Issue Pages 79-99
Keywords
Abstract This chapter tackles the challenging problem of human pose estimation in multi-view environments to handle scenes with self-occlusions. The proposed approach starts by first estimating the camera pose—extrinsic parameters—in multi-view scenarios; due to few real image datasets, different virtual scenes are generated by using a special simulator, for training and testing the proposed convolutional neural network based approaches. Then, these extrinsic parameters are used to establish the relation between different cameras into the multi-view scheme, which captures the pose of the person from different points of view at the same time. The proposed multi-view scheme allows to robustly estimate human body joints’ position even in situations where they are occluded. This would help to avoid possible false alarms in behavioral analysis systems of smart cities, as well as applications for physical therapy, safe moving assistance for the elderly among other. The chapter concludes by presenting experimental results in real scenes by using state-of-the-art and the proposed multi-view approaches.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN 978-3-031-06306-0 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ CSV2022b Serial 3810
Permanent link to this record
 

 
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla
Title (up) Human Pose Estimation through a Novel Multi-view Scheme Type Conference Article
Year 2022 Publication 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) Abbreviated Journal
Volume 5 Issue Pages 855-862
Keywords Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach
Abstract This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a
multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and
comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
Address On line; Feb 6, 2022 – Feb 8, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 2184-4321 ISBN 978-989-758-555-5 Medium
Area Expedition Conference VISAPP
Notes MSIAU; 600.160 Approved no
Call Number Admin @ si @ CSV2022 Serial 3689
Permanent link to this record
 

 
Author Guillem Martinez; Maya Aghaei; Martin Dijkstra; Bhalaji Nagarajan; Femke Jaarsma; Jaap van de Loosdrecht; Petia Radeva; Klaas Dijkstra
Title (up) Hyper-Spectral Imaging for Overlapping Plastic Flakes Segmentation Type Conference Article
Year 2022 Publication 47th International Conference on Acoustics, Speech, and Signal Processing Abbreviated Journal
Volume Issue Pages
Keywords Hyper-spectral imaging; plastic sorting; multi-label segmentation; bitfield encoding
Abstract In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
Address Singapore; May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICASSP
Notes MILAB; no proj Approved no
Call Number Admin @ si @ MAD2022 Serial 3767
Permanent link to this record
 

 
Author Angel Sappa (ed)
Title (up) ICT Applications for Smart Cities Type Book Whole
Year 2022 Publication ICT Applications for Smart Cities Abbreviated Journal
Volume 224 Issue Pages
Keywords Computational Intelligence; Intelligent Systems; Smart Cities; ICT Applications; Machine Learning; Pattern Recognition; Computer Vision; Image Processing
Abstract Part of the book series: Intelligent Systems Reference Library (ISRL)

This book is the result of four-year work in the framework of the Ibero-American Research Network TICs4CI funded by the CYTED program. In the following decades, 85% of the world's population is expected to live in cities; hence, urban centers should be prepared to provide smart solutions for problems ranging from video surveillance and intelligent mobility to the solid waste recycling processes, just to mention a few. More specifically, the book describes underlying technologies and practical implementations of several successful case studies of ICTs developed in the following smart city areas:

• Urban environment monitoring
• Intelligent mobility
• Waste recycling processes
• Video surveillance
• Computer-aided diagnose in healthcare systems
• Computer vision-based approaches for efficiency in production processes

The book is intended for researchers and engineers in the field of ICTs for smart cities, as well as to anyone who wants to know about state-of-the-art approaches and challenges on this field.
Address September 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title ISRL
Series Volume Series Issue Edition
ISSN ISBN 978-3-031-06306-0 Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ Sap2022 Serial 3812
Permanent link to this record
 

 
Author Michael Teutsch; Angel Sappa; Riad I. Hammoud
Title (up) Image and Video Enhancement Type Book Chapter
Year 2022 Publication Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision Abbreviated Journal
Volume Issue Pages 9-21
Keywords
Abstract Image and video enhancement aims at improving the signal quality relative to imaging artifacts such as noise and blur or atmospheric perturbations such as turbulence and haze. It is usually performed in order to assist humans in analyzing image and video content or simply to present humans visually appealing images and videos. However, image and video enhancement can also be used as a preprocessing technique to ease the task and thus improve the performance of subsequent automatic image content analysis algorithms: preceding dehazing can improve object detection as shown by [23] or explicit turbulence modeling can improve moving object detection as discussed by [24]. But it remains an open question whether image and video enhancement should rather be performed explicitly as a preprocessing step or implicitly for example by feeding affected images directly to a neural network for image content analysis like object detection [25]. Especially for real-time video processing at low latency it can be better to handle image perturbation implicitly in order to minimize the processing time of an algorithm. This can be achieved by making algorithms for image content analysis robust or even invariant to perturbations such as noise or blur. Additionally, mistakes of an individual preprocessing module can obviously affect the quality of the entire processing pipeline.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU; MACO Approved no
Call Number Admin @ si @ TSH2022a Serial 3807
Permanent link to this record
 

 
Author Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio
Title (up) Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics Type Journal Article
Year 2022 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 31 Issue Pages 5163 - 5177
Keywords Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements
Abstract In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes 600.161; 611.007 Approved no
Call Number Admin @ si @ SVG2022 Serial 3683
Permanent link to this record
 

 
Author Yecong Wan; Yuanshuo Cheng; Miingwen Shao; Jordi Gonzalez
Title (up) Image rain removal and illumination enhancement done in one go Type Journal Article
Year 2022 Publication Knowledge-Based Systems Abbreviated Journal KBS
Volume 252 Issue Pages 109244
Keywords
Abstract Rain removal plays an important role in the restoration of degraded images. Recently, CNN-based methods have achieved remarkable success. However, these approaches neglect that the appearance of real-world rain is often accompanied by low light conditions, which will further degrade the image quality, thereby hindering the restoration mission. Therefore, it is very indispensable to jointly remove the rain and enhance illumination for real-world rain image restoration. To this end, we proposed a novel spatially-adaptive network, dubbed SANet, which can remove the rain and enhance illumination in one go with the guidance of degradation mask. Meanwhile, to fully utilize negative samples, a contrastive loss is proposed to preserve more natural textures and consistent illumination. In addition, we present a new synthetic dataset, named DarkRain, to boost the development of rain image restoration algorithms in practical scenarios. DarkRain not only contains different degrees of rain, but also considers different lighting conditions, and more realistically simulates real-world rainfall scenarios. SANet is extensively evaluated on the proposed dataset and attains new state-of-the-art performance against other combining methods. Moreover, after a simple transformation, our SANet surpasses existing the state-of-the-art algorithms in both rain removal and low-light image enhancement.
Address Sept 2022
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.157; 600.168 Approved no
Call Number Admin @ si @ WCS2022 Serial 3744
Permanent link to this record
 

 
Author Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang
Title (up) Improving Handwritten Music Recognition through Language Model Integration Type Conference Article
Year 2022 Publication 4th International Workshop on Reading Music Systems (WoRMS2022) Abbreviated Journal
Volume Issue Pages 42-46
Keywords optical music recognition; historical sources; diversity; music theory; digital humanities
Abstract Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature.
Address November 18, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WoRMS
Notes DAG; 600.121; 600.162; 602.230 Approved no
Call Number Admin @ si @ TBF2022 Serial 3735
Permanent link to this record
 

 
Author Ana Garcia Rodriguez; Yael Tudela; Henry Cordova; S. Carballal; I. Ordas; L. Moreira; E. Vaquero; O. Ortiz; L. Rivero; F. Javier Sanchez; Miriam Cuatrecasas; Maria Pellise; Jorge Bernal; Gloria Fernandez Esparrach
Title (up) In vivo computer-aided diagnosis of colorectal polyps using white light endoscopy Type Journal Article
Year 2022 Publication Endoscopy International Open Abbreviated Journal ENDIO
Volume 10 Issue 9 Pages E1201-E1207
Keywords
Abstract Background and study aims Artificial intelligence is currently able to accurately predict the histology of colorectal polyps. However, systems developed to date use complex optical technologies and have not been tested in vivo. The objective of this study was to evaluate the efficacy of a new deep learning-based optical diagnosis system, ATENEA, in a real clinical setting using only high-definition white light endoscopy (WLE) and to compare its performance with endoscopists. Methods ATENEA was prospectively tested in real life on consecutive polyps detected in colorectal cancer screening colonoscopies at Hospital Clínic. No images were discarded, and only WLE was used. The in vivo ATENEA's prediction (adenoma vs non-adenoma) was compared with the prediction of four staff endoscopists without specific training in optical diagnosis for the study purposes. Endoscopists were blind to the ATENEA output. Histology was the gold standard. Results Ninety polyps (median size: 5 mm, range: 2-25) from 31 patients were included of which 69 (76.7 %) were adenomas. ATENEA correctly predicted the histology in 63 of 69 (91.3 %, 95 % CI: 82 %-97 %) adenomas and 12 of 21 (57.1 %, 95 % CI: 34 %-78 %) non-adenomas while endoscopists made correct predictions in 52 of 69 (75.4 %, 95 % CI: 60 %-85 %) and 20 of 21 (95.2 %, 95 % CI: 76 %-100 %), respectively. The global accuracy was 83.3 % (95 % CI: 74%-90 %) and 80 % (95 % CI: 70 %-88 %) for ATENEA and endoscopists, respectively. Conclusion ATENEA can accurately be used for in vivo characterization of colorectal polyps, enabling the endoscopist to make direct decisions. ATENEA showed a global accuracy similar to that of endoscopists despite an unsatisfactory performance for non-adenomatous lesions.
Address 2022 Sep 14
Corporate Author Thesis
Publisher PMID Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.157 Approved no
Call Number Admin @ si @ GTC2022b Serial 3752
Permanent link to this record
 

 
Author Kai Wang; Xialei Liu; Andrew Bagdanov; Luis Herranz; Shangling Jui; Joost Van de Weijer
Title (up) Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition Type Conference Article
Year 2022 Publication CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) Abbreviated Journal
Volume Issue Pages 3728-3738
Keywords Training; Computer vision; Image recognition; Upper bound; Conferences; Pattern recognition; Task analysis
Abstract In this paper we consider the problem of incremental meta-learning in which classes are presented incrementally in discrete tasks. We propose Episodic Replay Distillation (ERD), that mixes classes from the current task with exemplars from previous tasks when sampling episodes for meta-learning. To allow the training to benefit from a large as possible variety of classes, which leads to more gener-
alizable feature representations, we propose the cross-task meta loss. Furthermore, we propose episodic replay distillation that also exploits exemplars for improved knowledge distillation. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence scenarios, we reduce the gap between Incremental Meta-Learning and
the joint-training upper bound from 3.5% / 10.1% / 13.4% / 11.7% with the current state-of-the-art to 2.6% / 2.9% / 5.0% / 0.2% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100 / CUB, respectively.
Address New Orleans, USA; 20 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes LAMP; 600.147 Approved no
Call Number Admin @ si @ WLB2022 Serial 3686
Permanent link to this record
 

 
Author Carles Onielfa; Carles Casacuberta; Sergio Escalera
Title (up) Influence in Social Networks Through Visual Analysis of Image Memes Type Conference Article
Year 2022 Publication Artificial Intelligence Research and Development Abbreviated Journal
Volume 356 Issue Pages 71-80
Keywords
Abstract Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ OCE2022 Serial 3799
Permanent link to this record
 

 
Author Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar
Title (up) InfographicVQA Type Conference Article
Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 1697-1706
Keywords Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages
Abstract Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org
Address Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.155 Approved no
Call Number MBT2022 Serial 3625
Permanent link to this record
 

 
Author Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva
Title (up) Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty Type Conference Article
Year 2022 Publication IEEE International Conference on Signal Processing and Communications Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.
Address Bangalore; India; July 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SPCOM
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ PMD2022 Serial 3796
Permanent link to this record
 

 
Author Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Federica Cruciani; Lorenza Brusini; Petia Radeva
Title (up) Investigating Explainable Artificial Intelligence for MRI-based Classification of Dementia: a New Stability Criterion for Explainable Methods Type Conference Article
Year 2022 Publication 29th IEEE International Conference on Image Processing Abbreviated Journal
Volume Issue Pages
Keywords Image processing; Stability criteria; Machine learning; Robustness; Alzheimer's disease; Monitoring
Abstract Individuals diagnosed with Mild Cognitive Impairment (MCI) have shown an increased risk of developing Alzheimer’s Disease (AD). As such, early identification of dementia represents a key prognostic element, though hampered by complex disease patterns. Increasing efforts have focused on Machine Learning (ML) to build accurate classification models relying on a multitude of clinical/imaging variables. However, ML itself does not provide sensible explanations related to the model mechanism and feature contribution. Explainable Artificial Intelligence (XAI) represents the enabling technology in this framework, allowing to understand ML outcomes and derive human-understandable explanations. In this study, we aimed at exploring ML combined with MRI-based features and XAI to solve this classification problem and interpret the outcome. In particular, we propose a new method to assess the robustness of feature rankings provided by XAI methods, especially when multicollinearity exists. Our findings indicate that our method was able to disentangle the list of the informative features underlying dementia, with important implications for aiding personalized monitoring plans.
Address Bordeaux; France; October 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICIP
Notes MILAB Approved no
Call Number Admin @ si @ SBC2022 Serial 3789
Permanent link to this record
 

 
Author Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
Title (up) Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Type Conference Article
Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue Pages 1391-1400
Keywords Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning
Abstract The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.
Address Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.155; 302.105; Approved no
Call Number Admin @ si @ BMG2022 Serial 3663
Permanent link to this record