|   | 
Details
   web
Records
Author Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
Title Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Type Conference Article
Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal
Volume Issue (up) Pages 1391-1400
Keywords Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning
Abstract The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.
Address Virtual; Waikoloa; Hawai; USA; January 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACV
Notes DAG; 600.155; 302.105; Approved no
Call Number Admin @ si @ BMG2022 Serial 3663
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal
Title DocEnTr: An End-to-End Document Image Enhancement Transformer Type Conference Article
Year 2022 Publication 26th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue (up) Pages 1699-1705
Keywords Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
Abstract Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Address August 21-25, 2022 , Montréal Québec
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no
Call Number Admin @ si @ SBJ2022 Serial 3730
Permanent link to this record
 

 
Author Fei Yang; Yaxing Wang; Luis Herranz; Yongmei Cheng; Mikhail Mozerov
Title A Novel Framework for Image-to-image Translation and Image Compression Type Journal Article
Year 2022 Publication Neurocomputing Abbreviated Journal NEUCOM
Volume 508 Issue (up) Pages 58-70
Keywords
Abstract Data-driven paradigms using machine learning are becoming ubiquitous in image processing and communications. In particular, image-to-image (I2I) translation is a generic and widely used approach to image processing problems, such as image synthesis, style transfer, and image restoration. At the same time, neural image compression has emerged as a data-driven alternative to traditional coding approaches in visual communications. In this paper, we study the combination of these two paradigms into a joint I2I compression and translation framework, focusing on multi-domain image synthesis. We first propose distributed I2I translation by integrating quantization and entropy coding into an I2I translation framework (i.e. I2Icodec). In practice, the image compression functionality (i.e. autoencoding) is also desirable, requiring to deploy alongside I2Icodec a regular image codec. Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec. Adaptive residual blocks conditioned on the translation/compression mode provide flexible adaptation to the desired functionality. The experiments show promising results in both I2I translation and image compression using a single model.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ YWH2022 Serial 3679
Permanent link to this record
 

 
Author Yasuko Sugito; Javier Vazquez; Trevor Canham; Marcelo Bertalmio
Title Image quality evaluation in professional HDR/WCG production questions the need for HDR metrics Type Journal Article
Year 2022 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP
Volume 31 Issue (up) Pages 5163 - 5177
Keywords Measurement; Image color analysis; Image coding; Production; Dynamic range; Brightness; Extraterrestrial measurements
Abstract In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes 600.161; 611.007 Approved no
Call Number Admin @ si @ SVG2022 Serial 3683
Permanent link to this record
 

 
Author Kai Wang; Xialei Liu; Andrew Bagdanov; Luis Herranz; Shangling Jui; Joost Van de Weijer
Title Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition Type Conference Article
Year 2022 Publication CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) Abbreviated Journal
Volume Issue (up) Pages 3728-3738
Keywords Training; Computer vision; Image recognition; Upper bound; Conferences; Pattern recognition; Task analysis
Abstract In this paper we consider the problem of incremental meta-learning in which classes are presented incrementally in discrete tasks. We propose Episodic Replay Distillation (ERD), that mixes classes from the current task with exemplars from previous tasks when sampling episodes for meta-learning. To allow the training to benefit from a large as possible variety of classes, which leads to more gener-
alizable feature representations, we propose the cross-task meta loss. Furthermore, we propose episodic replay distillation that also exploits exemplars for improved knowledge distillation. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence scenarios, we reduce the gap between Incremental Meta-Learning and
the joint-training upper bound from 3.5% / 10.1% / 13.4% / 11.7% with the current state-of-the-art to 2.6% / 2.9% / 5.0% / 0.2% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100 / CUB, respectively.
Address New Orleans, USA; 20 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes LAMP; 600.147 Approved no
Call Number Admin @ si @ WLB2022 Serial 3686
Permanent link to this record
 

 
Author Zhaocheng Liu; Luis Herranz; Fei Yang; Saiping Zhang; Shuai Wan; Marta Mrak; Marc Gorriz
Title Slimmable Video Codec Type Conference Article
Year 2022 Publication CVPR 2022 Workshop and Challenge on Learned Image Compression (CLIC 2022, 5th Edition) Abbreviated Journal
Volume Issue (up) Pages 1742-1746
Keywords
Abstract Neural video compression has emerged as a novel paradigm combining trainable multilayer neural net-works and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.
Address Virtual; 19 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MACO; 601.379; 601.161 Approved no
Call Number Admin @ si @ LHY2022 Serial 3687
Permanent link to this record
 

 
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla
Title Human Pose Estimation through a Novel Multi-view Scheme Type Conference Article
Year 2022 Publication 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) Abbreviated Journal
Volume 5 Issue (up) Pages 855-862
Keywords Multi-view Scheme; Human Pose Estimation; Relative Camera Pose; Monocular Approach
Abstract This paper presents a multi-view scheme to tackle the challenging problem of the self-occlusion in human pose estimation problem. The proposed approach first obtains the human body joints of a set of images, which are captured from different views at the same time. Then, it enhances the obtained joints by using a
multi-view scheme. Basically, the joints from a given view are used to enhance poorly estimated joints from another view, especially intended to tackle the self occlusions cases. A network architecture initially proposed for the monocular case is adapted to be used in the proposed multi-view scheme. Experimental results and
comparisons with the state-of-the-art approaches on Human3.6m dataset are presented showing improvements in the accuracy of body joints estimations.
Address On line; Feb 6, 2022 – Feb 8, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 2184-4321 ISBN 978-989-758-555-5 Medium
Area Expedition Conference VISAPP
Notes MSIAU; 600.160 Approved no
Call Number Admin @ si @ CSV2022 Serial 3689
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla
Title Multi-Image Super-Resolution for Thermal Images Type Conference Article
Year 2022 Publication 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022) Abbreviated Journal
Volume 4 Issue (up) Pages 635-642
Keywords Thermal Images; Multi-view; Multi-frame; Super-Resolution; Deep Learning; Attention Block
Abstract This paper proposes a novel CNN architecture for the multi-thermal image super-resolution problem. In the proposed scheme, the multi-images are synthetically generated by downsampling and slightly shifting the given image; noise is also added to each of these synthesized images. The proposed architecture uses two
attention blocks paths to extract high-frequency details taking advantage of the large information extracted from multiple images of the same scene. Experimental results are provided, showing the proposed scheme has overcome the state-of-the-art approaches.
Address Online; Feb 6-8, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISAPP
Notes MSIAU; 601.349 Approved no
Call Number Admin @ si @ RSV2022a Serial 3690
Permanent link to this record
 

 
Author Wenjuan Gong; Zhang Yue; Wei Wang; Cheng Peng; Jordi Gonzalez
Title Meta-MMFNet: Meta-Learning Based Multi-Model Fusion Network for Micro-Expression Recognition Type Journal Article
Year 2022 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal ACMTMC
Volume Issue (up) Pages
Keywords Feature Fusion; Model Fusion; Meta-Learning; Micro-Expression Recognition
Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.
Address May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.157 Approved no
Call Number Admin @ si @ GYW2022 Serial 3692
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Felipe Lumbreras; Daniel Ponsa
Title 3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks Type Conference Article
Year 2022 Publication CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition) Abbreviated Journal
Volume Issue (up) Pages
Keywords Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition
Abstract The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively.
Address New Orleans, USA; 19 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes MSIAU; 600.130 Approved no
Call Number Admin @ si @ IBL2022 Serial 3693
Permanent link to this record
 

 
Author Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados
Title A Generic Image Retrieval Method for Date Estimation of Historical Document Collections Type Conference Article
Year 2022 Publication Document Analysis Systems.15th IAPR International Workshop, (DAS2022) Abbreviated Journal
Volume 13237 Issue (up) Pages 583–597
Keywords Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG
Abstract Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
Address La Rochelle, France; May 22–25, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ MGR2022 Serial 3694
Permanent link to this record
 

 
Author Josep Brugues Pujolras; Lluis Gomez; Dimosthenis Karatzas
Title A Multilingual Approach to Scene Text Visual Question Answering Type Conference Article
Year 2022 Publication Document Analysis Systems.15th IAPR International Workshop, (DAS2022) Abbreviated Journal
Volume Issue (up) Pages 65-79
Keywords Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning
Abstract Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.
Address La Rochelle, France; May 22–25, 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 611.004; 600.155; 601.002 Approved no
Call Number Admin @ si @ BGK2022b Serial 3695
Permanent link to this record
 

 
Author Miquel Angel Piera; Jose Luis Muñoz; Debora Gil; Gonzalo Martin; Jordi Manzano
Title A Socio-Technical Simulation Model for the Design of the Future Single Pilot Cockpit: An Opportunity to Improve Pilot Performance Type Journal Article
Year 2022 Publication IEEE Access Abbreviated Journal ACCESS
Volume 10 Issue (up) Pages 22330-22343
Keywords Human factors ; Performance evaluation ; Simulation; Sociotechnical systems ; System performance
Abstract The future deployment of single pilot operations must be supported by new cockpit computer services. Such services require an adaptive context-aware integration of technical functionalities with the concurrent tasks that a pilot must deal with. Advanced artificial intelligence supporting services and improved communication capabilities are the key enabling technologies that will render future cockpits more integrated with the present digitalized air traffic management system. However, an issue in the integration of such technologies is the lack of socio-technical analysis in the design of these teaming mechanisms. A key factor in determining how and when a service support should be provided is the dynamic evolution of pilot workload. This paper investigates how the socio-technical model-based systems engineering approach paves the way for the design of a digital assistant framework by formalizing this workload. The model was validated in an Airbus A-320 cockpit simulator, and the results confirmed the degraded pilot behavioral model and the performance impact according to different contextual flight deck information. This study contributes to practical knowledge for designing human-machine task-sharing systems.
Address Feb 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; Approved no
Call Number Admin @ si @ PMG2022 Serial 3697
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera; Vassilis Athitsos; Mohammad Sabokrou
Title All You Need In Sign Language Production Type Miscellaneous
Year 2022 Publication Arxiv Abbreviated Journal
Volume Issue (up) Pages
Keywords Sign Language Production; Sign Language Recog- nition; Sign Language Translation; Deep Learning; Survey; Deaf
Abstract Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system. Signlanguage recognition and production need to cope with some critical challenges. In this survey, we review recent advances in
Sign Language Production (SLP) and related areas using deep learning. To have more realistic perspectives to sign language, we present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language, the main differences between spoken language and sign language. Furthermore, we present the fundamental components of a bi-directional sign language translation system, discussing the main challenges in this area. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented. Finally, a general framework for SLP and performance evaluation, and also a discussion on the recent developments, advantages, and limitations in SLP, commenting on possible lines for future research are presented.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ RKE2022c Serial 3698
Permanent link to this record
 

 
Author Bojana Gajic; Ariel Amato; Ramon Baldrich; Joost Van de Weijer; Carlo Gatta
Title Area Under the ROC Curve Maximization for Metric Learning Type Conference Article
Year 2022 Publication CVPR 2022 Workshop on Efficien Deep Learning for Computer Vision (ECV 2022, 5th Edition) Abbreviated Journal
Volume Issue (up) Pages
Keywords Training; Computer vision; Conferences; Area measurement; Benchmark testing; Pattern recognition
Abstract Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.
Address New Orleans, USA; 20 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes CIC; LAMP; Approved no
Call Number Admin @ si @ GAB2022 Serial 3700
Permanent link to this record