%0 Journal Article
%T MSIF: multi-spectrum image fusion method for cross-modality person re-identification
%A Qingshan Chen
%A Zhenzhen Quan
%A Yifan Hu
%A Yujun Li
%A Zhi Liu
%A Mikhail Mozerov
%J International Journal of Machine Learning and Cybernetics
%D 2023
%F Qingshan Chen2023
%O LAMP
%O exported from refbase (http://refbase.cvc.uab.es/show.php?record=3885), last updated on Fri, 12 Jan 2024 16:24:41 +0100
%X Sketch-RGB cross-modality person re-identification (ReID) is a challenging task that aims to match a sketch portrait drawn by a professional artist with a full-body photo taken by surveillance equipment to deal with situations where the monitoring equipment is damaged at the accident scene. However, sketch portraits only provide highly abstract frontal body contour information and lack other important features such as color, pose, behavior, etc. The difference in saliency between the two modalities brings new challenges to cross-modality person ReID. To overcome this problem, this paper proposes a novel dual-stream model for cross-modality person ReID, which is able to mine modality-invariant features to reduce the discrepancy between sketch and camera images end-to-end. More specifically, we propose a multi-spectrum image fusion (MSIF) method, which aims to exploit the image appearance changes brought by multiple spectrums and guide the network to mine modality-invariant commonalities during training. It only processes the spectrum of the input images without adding additional calculations and model complexity, which can be easily integrated into other models. Moreover, we introduce a joint structure via a generalized mean pooling (GMP) layer and a self-attention (SA) mechanism to balance background and texture information and obtain the regional features with a large amount of information in the image. To further shrink the intra-class distance, a weighted regularized triplet (WRT) loss is developed without introducing additional hyperparameters. The model was first evaluated on the PKU Sketch ReID dataset, and extensive experimental results show that the Rank-1/mAP accuracy of our method is 87.00%/91.12%, reaching the current state-of-the-art performance. To further validate the effectiveness of our approach in handling cross-modality person ReID, we conducted experiments on two commonly used IR-RGB datasets (SYSU-MM01 and RegDB). The obtained results show that our method achieves competitive performance. These results confirm the ability of our method to effectively process images from different modalities.
%U https://link.springer.com/article/10.1007/s13042-023-01932-4