Vanishing Feature: Diagnosing Model Merging and Beyond
Xingyu, Qu ; Samuel, Horvath
Xingyu, Qu
Samuel, Horvath
Author
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the “vanishing feature” phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the “Preserve-First Merging” (PFM) strategy, which preserves early-layer features, enabling merged VGG16 models on CIFAR-10 to surpass the original models without post-training for the first time. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.
Citation
X. Qu and S. Horvath, “Vanishing Feature: Diagnosing Model Merging and Beyond,” in Proc. 2nd Conf. Parsimony and Learning (CPAL), Stanford, CA, USA, Mar. 24–27, 2025, Proc. Mach. Learn. Res., vol. 280, pp. 1051–1086.
Source
Proceedings of Machine Learning Research
Conference
2nd Conference on Parsimony and Learning, CPAL 2025
Keywords
Learning Systems, Machine Learning, Personnel Training, Empirical Analysis, Merging Strategy, Model Merging, Model Pruning, Normalisation, Normalization Strategies, Original Model, Performance, Simple++, Trained Neural Networks, Merging
Subjects
Source
2nd Conference on Parsimony and Learning, CPAL 2025
Publisher
ML Research Press
