|   | 
Details
   web
Records
Author Adarsh Tiwari; Sanket Biswas; Josep Llados
Title Can Pre-trained Language Models Help in Understanding Handwritten Symbols? Type Conference Article
Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 14193 Issue Pages 199–211
Keywords
Abstract The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets.
Address San Jose; CA; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ TBL2023 Serial 3908
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title Best Solutions Proposed in the Context of the Face Anti-spoofing Challenge Series Type Book Chapter
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages 37–78
Keywords
Abstract The PAD competitions we organized attracted more than 835 teams from home and abroad, most of them from the industry, which shows that the topic of face anti-spoofing is closely related to daily life, and there is an urgent need for advanced algorithms to solve its application needs. Specifically, the Chalearn LAP multi-modal face anti-spoofing attack detection challenge attracted more than 300 teams for the development phase with a total of 13 teams qualifying for the final round; the Chalearn Face Anti-spoofing Attack Detection Challenge attracted 340 teams in the development stage, and finally, 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively; the 3D High-Fidelity Mask Face Presentation Attack Detection Challenge attracted 195 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-run by the organizing team, and the results were used for the final ranking. In this chapter, we briefly the methods developed by the teams participating in each competition, and introduce the algorithm details of the top-three ranked teams in detail.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023d Serial 3958
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title Face Presentation Attack Detection (PAD) Challenges Type Book Chapter
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages 17–35
Keywords
Abstract In recent years, the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized three editions of Face Anti-spoofing Workshop and Competition at CVPR 2019, CVPR 2020, and ICCV 2021, which have attracted more than 800 teams from academia and industry, and greatly promoted the algorithms to overcome many challenging problems. In this chapter, we introduce the detailed competition process, including the challenge phases, timeline and evaluation metrics. Along with the workshop, we will introduce the corresponding dataset for each competition including data acquisition details, data processing, statistics, and evaluation protocol. Finally, we provide the available link to download the datasets used in the challenges.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023b Serial 3956
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title Face Anti-spoofing Progress Driven by Academic Challenges Type Book Chapter
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages 1–15
Keywords
Abstract With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that prevent the recording and distribution of facial images, the little attention that the community has given to potential ethnic biases among others. This chapter provides an overview of contributions derived from the organization of academic challenges in the context of face anti-spoofing detection. Specifically, we discuss the limitations of benchmarks and summarize our efforts in trying to boost research by the community via the participation in academic challenges
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title SLCV
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023c Serial 3957
Permanent link to this record
 

 
Author Alejandro Ariza-Casabona; Bartlomiej Twardowski; Tri Kurniawan Wijaya
Title Exploiting Graph Structured Cross-Domain Representation for Multi-domain Recommendation Type Conference Article
Year 2023 Publication European Conference on Information Retrieval – ECIR 2023: Advances in Information Retrieval Abbreviated Journal
Volume 13980 Issue Pages 49–65
Keywords
Abstract Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer. Both can be achieved by introducing a specific modeling of input data (i.e. disjoint history) or trying dedicated training regimes. At the same time, treating domains as separate input sources becomes a limitation as it does not capture the interplay that naturally exists between domains. In this work, we efficiently learn multi-domain representation of sequential users’ interactions using graph neural networks. We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec (short for Multi-dom Ain Graph-based Recommender). To better capture all relations in a multi-domain setting, we learn two graph-based sequential representations simultaneously: domain-guided for recent user interest, and general for long-term interest. This approach helps to mitigate the negative knowledge transfer problem from multiple domains and improve overall representation. We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods. Furthermore, we provide an ablation study and discuss further extensions of our method.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECIR
Notes LAMP Approved no
Call Number Admin @ si @ ATK2023 Serial 3933
Permanent link to this record
 

 
Author Mickael Coustaty; Alicia Fornes
Title Document Analysis and Recognition – ICDAR 2023 Workshops Type Book Whole
Year 2023 Publication Document Analysis and Recognition – ICDAR 2023 Workshops Abbreviated Journal
Volume 14194 Issue 2 Pages
Keywords
Abstract
Address San Jose; USA; August 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ CoF2023 Serial 3852
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li
Title Advances in Face Presentation Attack Detection Type Book Whole
Year 2023 Publication Advances in Face Presentation Attack Detection Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ WGE2023a Serial 3955
Permanent link to this record
 

 
Author Qingshan Chen; Zhenzhen Quan; Yifan Hu; Yujun Li; Zhi Liu; Mikhail Mozerov
Title MSIF: multi-spectrum image fusion method for cross-modality person re-identification Type Journal Article
Year 2023 Publication International Journal of Machine Learning and Cybernetics Abbreviated Journal IJMLC
Volume Issue Pages
Keywords
Abstract Sketch-RGB cross-modality person re-identification (ReID) is a challenging task that aims to match a sketch portrait drawn by a professional artist with a full-body photo taken by surveillance equipment to deal with situations where the monitoring equipment is damaged at the accident scene. However, sketch portraits only provide highly abstract frontal body contour information and lack other important features such as color, pose, behavior, etc. The difference in saliency between the two modalities brings new challenges to cross-modality person ReID. To overcome this problem, this paper proposes a novel dual-stream model for cross-modality person ReID, which is able to mine modality-invariant features to reduce the discrepancy between sketch and camera images end-to-end. More specifically, we propose a multi-spectrum image fusion (MSIF) method, which aims to exploit the image appearance changes brought by multiple spectrums and guide the network to mine modality-invariant commonalities during training. It only processes the spectrum of the input images without adding additional calculations and model complexity, which can be easily integrated into other models. Moreover, we introduce a joint structure via a generalized mean pooling (GMP) layer and a self-attention (SA) mechanism to balance background and texture information and obtain the regional features with a large amount of information in the image. To further shrink the intra-class distance, a weighted regularized triplet (WRT) loss is developed without introducing additional hyperparameters. The model was first evaluated on the PKU Sketch ReID dataset, and extensive experimental results show that the Rank-1/mAP accuracy of our method is 87.00%/91.12%, reaching the current state-of-the-art performance. To further validate the effectiveness of our approach in handling cross-modality person ReID, we conducted experiments on two commonly used IR-RGB datasets (SYSU-MM01 and RegDB). The obtained results show that our method achieves competitive performance. These results confirm the ability of our method to effectively process images from different modalities.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ CQH2023 Serial 3885
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title ZS-GR: zero-shot gesture recognition from RGB-D videos Type Journal Article
Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 82 Issue Pages 43781-43796
Keywords
Abstract Gesture Recognition (GR) is a challenging research area in computer vision. To tackle the annotation bottleneck in GR, we formulate the problem of Zero-Shot Gesture Recognition (ZS-GR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on five datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, NTU-60, and isoGD obtaining state-of-the-art results compared to state-of-the-art ZS-GR models as well as the Zero-Shot Action Recognition (ZS-AR).
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ RKE2023a Serial 3879
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
Title A deep co-attentive hand-based video question answering framework using multi-view skeleton Type Journal Article
Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 82 Issue Pages 1401–1429
Keywords
Abstract In this paper, we present a novel hand –based Video Question Answering framework, entitled Multi-View Video Question Answering (MV-VQA), employing the Single Shot Detector (SSD), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), and Co-Attention mechanism with RGB videos as the inputs. Our model includes three main blocks: vision, language, and attention. In the vision block, we employ a novel representation to obtain some efficient multiview features from the hand object using the combination of five 3DCNNs and one LSTM network. To obtain the question embedding, we use the BERT model in language block. Finally, we employ a co-attention mechanism on vision and language features to recognize the final answer. For the first time, we propose such a hand-based Video-QA framework including the multi-view hand skeleton features combined with the question embedding and co-attention mechanism. Our framework is capable of processing the arbitrary numbers of questions in the dataset annotations. There are different application domains for this framework. Here, as an application domain, we applied our framework to dynamic hand gesture recognition for the first time. Since the main object in dynamic hand gesture recognition is the human hand, we performed a step-by-step analysis of the hand detection and multi-view hand skeleton impact on the model performance. Evaluation results on five datasets, including two datasets in VideoQA, two datasets in dynamic hand gesture, and one dataset in hand action recognition show that MV-VQA outperforms state-of-the-art alternatives.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ RKE2023b Serial 3881
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Henry Velesaca; Angel Sappa
Title Object Detection in Very Low-Resolution Thermal Images through a Guided-Based Super-Resolution Approach Type Conference Article
Year 2023 Publication 17th International Conference on Signal-Image Technology & Internet-Based Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract This work proposes a novel approach that integrates super-resolution techniques with off-the-shelf object detection methods to tackle the problem of handling very low-resolution thermal images. The suggested approach begins by enhancing the low-resolution (LR) thermal images through a guided super-resolution strategy, leveraging a high-resolution (HR) visible spectrum image. Subsequently, object detection is performed on the high-resolution thermal image. The experimental results demonstrate tremendous improvements in comparison with both scenarios: when object detection is performed on the LR thermal image alone, as well as when object detection is conducted on the up-sampled LR thermal image. Moreover, the proposed approach proves highly valuable in camouflaged scenarios where objects might remain undetected in visible spectrum images.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SITIS
Notes MSIAU Approved no
Call Number Admin @ si @ RVS2023 Serial 4010
Permanent link to this record
 

 
Author Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez
Title Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning Type Conference Article
Year 2023 Publication International Conference on Intelligent Robots and Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
Address Detroit; USA; October 2023
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IROS
Notes ADAS Approved no
Call Number Admin @ si @ XCP2023 Serial 3930
Permanent link to this record
 

 
Author Armin Mehri; Parichehr Behjati; Dario Carpio; Angel Sappa
Title SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution Type Journal Article
Year 2023 Publication IEEE Access Abbreviated Journal ACCESS
Volume 11 Issue Pages
Keywords
Abstract Recent breakthroughs in single image super resolution have investigated the potential of deep Convolutional Neural Networks (CNNs) to improve performance. However, CNNs based models suffer from their limited fields and their inability to adapt to the input content. Recently, Transformer based models were presented, which demonstrated major performance gains in Natural Language Processing and Vision tasks while mitigating the drawbacks of CNNs. Nevertheless, Transformer computational complexity can increase quadratically for high-resolution images, and the fact that it ignores the original structures of the image by converting them to the 1D structure can make it problematic to capture the local context information and adapt it for real-time applications. In this paper, we present, SRFormer, an efficient yet powerful Transformer-based architecture, by making several key designs in the building of Transformer blocks and Transformer layers that allow us to consider the original structure of the image (i.e., 2D structure) while capturing both local and global dependencies without raising computational demands or memory consumption. We also present a Gated Multi-Layer Perceptron (MLP) Feature Fusion module to aggregate the features of different stages of Transformer blocks by focusing on inter-spatial relationships while adding minor computational costs to the network. We have conducted extensive experiments on several super-resolution benchmark datasets to evaluate our approach. SRFormer demonstrates superior performance compared to state-of-the-art methods from both Transformer and Convolutional networks, with an improvement margin of 0.1∼0.53dB . Furthermore, while SRFormer has almost the same model size, it outperforms SwinIR by 0.47% and inference time by half the time of SwinIR. The code will be available on GitHub.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU Approved no
Call Number Admin @ si @ MBC2023 Serial 3887
Permanent link to this record
 

 
Author Qingshan Chen; Zhenzhen Quan; Yujun Li; Chao Zhai; Mikhail Mozerov
Title An Unsupervised Domain Adaption Approach for Cross-Modality RGB-Infrared Person Re-Identification Type Journal Article
Year 2023 Publication IEEE Sensors Journal Abbreviated Journal IEEE-SENS
Volume 23 Issue 24 Pages
Keywords Q. Chen, Z. Quan, Y. Li, C. Zhai and M. G. Mozerov
Abstract Dual-camera systems commonly employed in surveillance serve as the foundation for RGB-infrared (IR) cross-modality person re-identification (ReID). However, significant modality differences give rise to inferior performance compared to single-modality scenarios. Furthermore, most existing studies in this area rely on supervised training with meticulously labeled datasets. Labeling RGB-IR image pairs is more complex than labeling conventional image data, and deploying pretrained models on unlabeled datasets can lead to catastrophic performance degradation. In contrast to previous solutions that focus solely on cross-modality or domain adaptation issues, this article presents an end-to-end unsupervised domain adaptation (UDA) framework for the cross-modality person ReID, which can simultaneously address both of these challenges. This model employs source domain classes, target domain clusters, and unclustered instance samples for the training, maximizing the comprehensive use of the dataset. Moreover, it addresses the problem of mismatched clustering labels between the two modalities in the target domain by incorporating a label matching module that reassigns reliable clusters with labels, ensuring correspondence between different modality labels. We construct the loss function by incorporating distinctiveness loss and multiplicity loss, both of which are determined by the similarity of neighboring features in the predicted feature space and the difference between distant features. This approach enables efficient feature clustering and cluster class assignment to occur concurrently. Eight UDA cross-modality person ReID experiments are conducted on three real datasets and six synthetic datasets. The experimental results unequivocally demonstrate that the proposed model outperforms the existing state-of-the-art algorithms to a significant degree. Notably, in RegDB → RegDB_light, the Rank-1 accuracy exhibits a remarkable improvement of 8.24%.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ CQL2023 Serial 3884
Permanent link to this record
 

 
Author Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui; Jian Yang
Title Trust Your Good Friends: Source-Free Domain Adaptation by Reciprocal Neighborhood Clustering Type Journal Article
Year 2023 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 45 Issue 12 Pages 15883-15895
Keywords
Abstract Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g., due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors. To aggregate information with more context, we consider expanded neighborhoods with small affinity values. Furthermore, we consider the density around each target sample, which can alleviate the negative impact of potential outliers. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; MACO Approved no
Call Number Admin @ si @ YWW2023 Serial 3889
Permanent link to this record