Item

Transformer-based RGBT Tracking with Spatio-Temporal Information Fusion

Yuan, Di
Zhang, Haiping
Liu, Qiao
Chang, Xiaojun
He, Zhenyu
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
RGBT tracking (RGB-Thermal) usually uses an RGB tracker as the base model, and then uses the RGBT dataset to fully fine-tune the model. These methods ignore the differences in target features in the RGB domain and the TIR domain. At the same time, existing RGBT trackers match the spatial features of the initial template and the search image, ignoring the role of temporal information in RGBT tracking, resulting in the failure of the tracker to track in complex scenarios such as change in the appearance of the target and occlusion. To address the above problems, we propose a simple and efficient tracker called STTrack. The tracker adopts a symmetric dual-stream structure, which consists of several FT Transformer(Fine Tuning Transformer) Encoders, a prediction head, and an online update module. Specifically, the FT Transformer Encoder first adds some trainable parameters to the frozen pre-trained RGB-based tracker, transfers the feature extraction capability from the RGB domain to the TIR domain, and enhances the model’s perception of cross-modal data; secondly, the output features of the RGB and TIR modalities are fused and fed into the prediction head to obtain the target’s position; finally, the online update module obtains an online template with temporal information, which complements the spatial information provided by the initial template. The spatio-temporal information provided by the dual templates improves the RGBT tracker’s ability to locate targets in complex environments. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance on four most popular RGBT benchmarks and runs at 32FPS in real time.
Citation
D. Yuan, H. Zhang, Q. Liu, X. Chang and Z. He, "Transformer-based RGBT Tracking with Spatio-Temporal Information Fusion," in IEEE Sensors Journal, doi: 10.1109/JSEN.2025.3575188
Source
IEEE Sensors Journal
Conference
Keywords
RGBT Tracking, Adapter Learning, Temporal Information, Spatio-Temporal Information Fusion
Subjects
Source
Publisher
IEEE
Full-text link