|   | 
Details
   web
Records
Author Joana Maria Pujadas-Mora; Alicia Fornes; Oriol Ramos Terrades; Josep Llados; Jialuo Chen; Miquel Valls-Figols; Anna Cabre
Title The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data Type Journal
Year 2022 Publication Historical Life Course Studies Abbreviated Journal HLCS
Volume 12 Issue Pages 99-132
Keywords Individual demographic databases; Computer vision, Record linkage; Social mobility; Inequality; Migration; Word spotting; Handwriting recognition; Local censuses; Marriage Licences
Abstract The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.
Address June 23, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no
Call Number Admin @ si @ PFR2022 Serial 3737
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou
Title All You Need In Sign Language Production Type Miscellaneous
Year 2022 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf
Abstract Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in
Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ RKE2022c Serial 3698
Permanent link to this record
 

 
Author Idoia Ruiz; Joan Serrat
Title Hierarchical Novelty Detection for Traffic Sign Recognition Type Journal Article
Year 2022 Publication Sensors Abbreviated Journal SENS
Volume 22 Issue 12 Pages 4389
Keywords Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision
Abstract Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.154 Approved no
Call Number Admin @ si @ RuS2022 Serial 3684
Permanent link to this record
 

 
Author Saad Minhas; Zeba Khanam; Shoaib Ehsan; Klaus McDonald Maier; Aura Hernandez-Sabate
Title Weather Classification by Utilizing Synthetic Data Type Journal Article
Year 2022 Publication Sensors Abbreviated Journal SENS
Volume 22 Issue 9 Pages 3193
Keywords Weather classification; synthetic data; dataset; autonomous car; computer vision; advanced driver assistance systems; deep learning; intelligent transportation systems
Abstract Weather prediction from real-world images can be termed a complex task when targeting classification using neural networks. Moreover, the number of images throughout the available datasets can contain a huge amount of variance when comparing locations with the weather those images are representing. In this article, the capabilities of a custom built driver simulator are explored specifically to simulate a wide range of weather conditions. Moreover, the performance of a new synthetic dataset generated by the above simulator is also assessed. The results indicate that the use of synthetic datasets in conjunction with real-world datasets can increase the training efficiency of the CNNs by as much as 74%. The article paves a way forward to tackle the persistent problem of bias in vision-based datasets.
Address 21 April 2022
Corporate Author Thesis
Publisher MDPI Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; 600.139; 600.159; 600.166; 600.145; Approved no
Call Number Admin @ si @ MKE2022 Serial 3761
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud
Title A Novel Domain Transfer-Based Approach for Unsupervised Thermal Image Super-Resolution Type Journal Article
Year 2022 Publication Sensors Abbreviated Journal SENS
Volume 22 Issue 6 Pages 2254
Keywords Thermal image super-resolution; unsupervised super-resolution; thermal images; attention module; semiregistered thermal images
Abstract This paper presents a transfer domain strategy to tackle the limitations of low-resolution thermal sensors and generate higher-resolution images of reasonable quality. The proposed technique employs a CycleGAN architecture and uses a ResNet as an encoder in the generator along with an attention module and a novel loss function. The network is trained on a multi-resolution thermal image dataset acquired with three different thermal sensors. Results report better performance benchmarking results on the 2nd CVPR-PBVS-2021 thermal image super-resolution challenge than state-of-the-art methods. The code of this work is available online.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU; Approved no
Call Number Admin @ si @ RSV2022b Serial 3688
Permanent link to this record
 

 
Author Aura Hernandez-Sabate; Jose Elias Yauri; Pau Folch; Miquel Angel Piera; Debora Gil
Title Recognition of the Mental Workloads of Pilots in the Cockpit Using EEG Signals Type Journal Article
Year 2022 Publication Applied Sciences Abbreviated Journal APPLSCI
Volume 12 Issue 5 Pages 2298
Keywords Cognitive states; Mental workload; EEG analysis; Neural networks; Multimodal data fusion
Abstract The commercial flightdeck is a naturally multi-tasking work environment, one in which interruptions are frequent come in various forms, contributing in many cases to aviation incident reports. Automatic characterization of pilots’ workloads is essential to preventing these kind of incidents. In addition, minimizing the physiological sensor network as much as possible remains both a challenge and a requirement. Electroencephalogram (EEG) signals have shown high correlations with specific cognitive and mental states, such as workload. However, there is not enough evidence in the literature to validate how well models generalize in cases of new subjects performing tasks with workloads similar to the ones included during the model’s training. In this paper, we propose a convolutional neural network to classify EEG features across different mental workloads in a continuous performance task test that partly measures working memory and working memory capacity. Our model is valid at the general population level and it is able to transfer task learning to pilot mental workload recognition in a simulated operational environment.
Address February 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; ADAS; 600.139; 600.145; 600.118 Approved no
Call Number Admin @ si @ HYF2022 Serial 3720
Permanent link to this record
 

 
Author Guillermo Torres; Sonia Baeza; Carles Sanchez; Ignasi Guasch; Antoni Rosell; Debora Gil
Title An Intelligent Radiomic Approach for Lung Cancer Screening Type Journal Article
Year 2022 Publication Applied Sciences Abbreviated Journal APPLSCI
Volume 12 Issue 3 Pages 1568
Keywords Lung cancer; Early diagnosis; Screening; Neural networks; Image embedding; Architecture optimization
Abstract The efficiency of lung cancer screening for reducing mortality is hindered by the high rate of false positives. Artificial intelligence applied to radiomics could help to early discard benign cases from the analysis of CT scans. The available amount of data and the fact that benign cases are a minority, constitutes a main challenge for the successful use of state of the art methods (like deep learning), which can be biased, over-fitted and lack of clinical reproducibility. We present an hybrid approach combining the potential of radiomic features to characterize nodules in CT scans and the generalization of the feed forward networks. In order to obtain maximal reproducibility with minimal training data, we propose an embedding of nodules based on the statistical significance of radiomic features for malignancy detection. This representation space of lesions is the input to a feed
forward network, which architecture and hyperparameters are optimized using own-defined metrics of the diagnostic power of the whole system. Results of the best model on an independent set of patients achieve 100% of sensitivity and 83% of specificity (AUC = 0.94) for malignancy detection.
Address Jan 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; 600.139; 600.145 Approved no
Call Number Admin @ si @ TBS2022 Serial 3699
Permanent link to this record
 

 
Author Carles Onielfa; Carles Casacuberta; Sergio Escalera
Title Influence in Social Networks Through Visual Analysis of Image Memes Type Conference Article
Year 2022 Publication Artificial Intelligence Research and Development Abbreviated Journal
Volume 356 Issue Pages 71-80
Keywords
Abstract Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ OCE2022 Serial 3799
Permanent link to this record
 

 
Author Penny Tarling; Mauricio Cantor; Albert Clapes; Sergio Escalera
Title Deep learning with self-supervision and uncertainty regularization to count fish in underwater images Type Journal Article
Year 2022 Publication PloS One Abbreviated Journal Plos
Volume 17 Issue 5 Pages e0267759
Keywords
Abstract Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.
Address
Corporate Author Thesis
Publisher Public Library of Science Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA Approved no
Call Number Admin @ si @ TCC2022 Serial 3743
Permanent link to this record
 

 
Author Sonia Baeza; Debora Gil; I.Garcia Olive; M.Salcedo; J.Deportos; Carles Sanchez; Guillermo Torres; G.Moragas; Antoni Rosell
Title A novel intelligent radiomic analysis of perfusion SPECT/CT images to optimize pulmonary embolism diagnosis in COVID-19 patients Type Journal Article
Year 2022 Publication EJNMMI Physics Abbreviated Journal EJNMMI-PHYS
Volume 9 Issue 1, Article 84 Pages 1-17
Keywords
Abstract Background: COVID-19 infection, especially in cases with pneumonia, is associated with a high rate of pulmonary embolism (PE). In patients with contraindications for CT pulmonary angiography (CTPA) or non-diagnostic CTPA, perfusion single-photon emission computed tomography/computed tomography (Q-SPECT/CT) is a diagnostic alternative. The goal of this study is to develop a radiomic diagnostic system to detect PE based only on the analysis of Q-SPECT/CT scans.
Methods: This radiomic diagnostic system is based on a local analysis of Q-SPECT/CT volumes that includes both CT and Q-SPECT values for each volume point. We present a combined approach that uses radiomic features extracted from each scan as input into a fully connected classifcation neural network that optimizes a weighted crossentropy loss trained to discriminate between three diferent types of image patterns (pixel sample level): healthy lungs (control group), PE and pneumonia. Four types of models using diferent confguration of parameters were tested.
Results: The proposed radiomic diagnostic system was trained on 20 patients (4,927 sets of samples of three types of image patterns) and validated in a group of 39 patients (4,410 sets of samples of three types of image patterns). In the training group, COVID-19 infection corresponded to 45% of the cases and 51.28% in the test group. In the test group, the best model for determining diferent types of image patterns with PE presented a sensitivity, specifcity, positive predictive value and negative predictive value of 75.1%, 98.2%, 88.9% and 95.4%, respectively. The best model for detecting
pneumonia presented a sensitivity, specifcity, positive predictive value and negative predictive value of 94.1%, 93.6%, 85.2% and 97.6%, respectively. The area under the curve (AUC) was 0.92 for PE and 0.91 for pneumonia. When the results obtained at the pixel sample level are aggregated into regions of interest, the sensitivity of the PE increases to 85%, and all metrics improve for pneumonia.
Conclusion: This radiomic diagnostic system was able to identify the diferent lung imaging patterns and is a frst step toward a comprehensive intelligent radiomic system to optimize the diagnosis of PE by Q-SPECT/CT.
Address 5 dec 2022
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM Approved no
Call Number Admin @ si @ BGG2022 Serial 3759
Permanent link to this record
 

 
Author Xavier Otazu; Xim Cerda-Company
Title The contribution of luminance and chromatic channels to color assimilation Type Journal Article
Year 2022 Publication Journal of Vision Abbreviated Journal JOV
Volume 22(6) Issue 10 Pages 1-15
Keywords
Abstract Color induction is the phenomenon where the physical and the perceived colors of an object differ owing to the color distribution and the spatial configuration of the surrounding objects. Previous works studying this phenomenon on the lsY MacLeod–Boynton color space, show that color assimilation is present only when the magnocellular pathway (i.e., the Y axis) is activated (i.e., when there are luminance differences). Concretely, the authors showed that the effect is mainly induced by the koniocellular pathway (s axis), but not by the parvocellular pathway (l axis), suggesting that when magnocellular pathway is activated it inhibits the koniocellular pathway. In the present work, we study whether parvo-, konio-, and magnocellular pathways may influence on each other through the color induction effect. Our results show that color assimilation does not depend on a chromatic–chromatic interaction, and that chromatic assimilation is driven by the interaction between luminance and chromatic channels (mainly the magno- and the koniocellular pathways). Our results also show that chromatic induction is greatly decreased when all three visual pathways are simultaneously activated, and that chromatic pathways could influence each other through the magnocellular (luminance) pathway. In addition, we observe that chromatic channels can influence the luminance channel, hence inducing a small brightness induction. All these results show that color induction is a highly complex process where interactions between the several visual pathways are yet unknown and should be studied in greater detail.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Neurobit; 600.128; 600.120; 600.158 Approved no
Call Number Admin @ si @ OtC2022 Serial 3685
Permanent link to this record
 

 
Author David Berga; Xavier Otazu
Title A neurodynamic model of saliency prediction in v1 Type Journal Article
Year 2022 Publication Neural Computation Abbreviated Journal NEURALCOMPUT
Volume 34 Issue 2 Pages 378-414
Keywords
Abstract Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT; 600.128; 600.120 Approved no
Call Number Admin @ si @ BeO2022 Serial 3696
Permanent link to this record
 

 
Author Hugo Bertiche; Meysam Madadi; Sergio Escalera
Title Neural Cloth Simulation Type Journal Article
Year 2022 Publication ACM Transactions on Graphics Abbreviated Journal ACMTGraph
Volume 41 Issue 6 Pages 1-14
Keywords
Abstract We present a general framework for the garment animation problem through unsupervised deep learning inspired in physically based simulation. Existing trends in the literature already explore this possibility. Nonetheless, these approaches do not handle cloth dynamics. Here, we propose the first methodology able to learn realistic cloth dynamics unsupervisedly, and henceforth, a general formulation for neural cloth simulation. The key to achieve this is to adapt an existing optimization scheme for motion from simulation based methodologies to deep learning. Then, analyzing the nature of the problem, we devise an architecture able to automatically disentangle static and dynamic cloth subspaces by design. We will show how this improves model performance. Additionally, this opens the possibility of a novel motion augmentation technique that greatly improves generalization. Finally, we show it also allows to control the level of motion in the predictions. This is a useful, never seen before, tool for artists. We provide of detailed analysis of the problem to establish the bases of neural cloth simulation and guide future research into the specifics of this domain.



ACM Transactions on GraphicsVolume 41Issue 6December 2022 Article No.: 220pp 1–
Address Dec 2022
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ BME2022b Serial 3779
Permanent link to this record
 

 
Author Wenjuan Gong; Zhang Yue; Wei Wang; Cheng Peng; Jordi Gonzalez
Title Meta-MMFNet: Meta-Learning Based Multi-Model Fusion Network for Micro-Expression Recognition Type Journal Article
Year 2022 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal ACMTMC
Volume Issue Pages
Keywords Feature Fusion; Model Fusion; Meta-Learning; Micro-Expression Recognition
Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.
Address May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.157 Approved no
Call Number Admin @ si @ GYW2022 Serial 3692
Permanent link to this record
 

 
Author Alex Falcon; Swathikiran Sudhakaran; Giuseppe Serra; Sergio Escalera; Oswald Lanz
Title Relevance-based Margin for Contrastively-trained Video Retrieval Models Type Conference Article
Year 2022 Publication ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval Abbreviated Journal
Volume Issue Pages 146-157
Keywords
Abstract Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at \urlhttps://github.com/aranciokov/RelevanceMargin-ICMR22.
Address Newwark, NJ, USA, 27 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICMR
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ FSS2022 Serial 3808
Permanent link to this record