Self-supervised transformer architecture search via multi-level masked information fusion
Yan, Caixia ; Chang, Xiaojun ; Zhang, Weizhan ; Li, Zhihui ; Yao, Lina ; Luo, Minnan ; Tian, Feng
Yan, Caixia
Chang, Xiaojun
Zhang, Weizhan
Li, Zhihui
Yao, Lina
Luo, Minnan
Tian, Feng
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Transformer Architecture Search (TAS) aims to discover desirable transformer architectures without extensive manual design and tuning. However, existing methods heavily rely on labeled data, hindering their deployment in real-world scenarios. For this issue, we pioneer a self-supervised TAS task that aims to uncover optimal vision transformer architectures from unlabeled data alone. To accomplish this challenging task, we develop a Block-Wisely Self-Supervised One-Shot TAS framework underpinned by multi-level masked information fusion, termed as BWSS-TAS. Specifically, BWSS-TAS designs a siamese encoder–decoder architecture for the supernet to enable Masked Image Modeling (MIM)-based self-supervised training. To tackle co-training bias of the large weight-sharing space, we modularize the whole supernet into smaller blocks and perform image reconstruction through distilling and fusing multi-level spatial-semantic information. To pursue reliable architecture search, we develop a block-wisely self-supervised evolutionary algorithm equipped with a unified unsupervised evaluation metric. The architecture evolution process is guided by the non-linear ranking fusion of two complementary dimensions of masked information, i.e., teacher–student consistency and image reconstruction quality. Extensive experiments demonstrate that the searched tiny/small/base-sized transformer architectures perform favorably against state-of-the-art TAS methods, even without using manual labels. Moreover, BWSS-TAS can generalize well to various data domains and tasks via self-supervised, information-fusion-oriented architecture discovery.
Citation
C. Yan et al., “Self-supervised transformer architecture search via multi-level masked information fusion,” Information Fusion, vol. 126, p. 103640, Feb. 2026, doi: 10.1016/J.INFFUS.2025.103640
Source
Information Fusion
Conference
Keywords
Multi-Level Information Fusion, Transformer Architecture Search, Self-Supervised Learning, Masked Image Modeling
Subjects
Source
Publisher
Elsevier
