Item

Rethinking softmax in incremental learning

Zhai, Zheng
Zhang, Jiali
Wang, Haiyu
Wu, Mingxin
Yang, Keshun
Qiao, Xiaoyan
Sun, Qiang
Supervisor
Department
Statistics and Data Science
Embargo End Date
Type
Journal article
Date
2026
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Mitigating catastrophic forgetting remains a fundamental challenge in incremental learning. This paper identifies a key limitation of the widely used softmax cross-entropy loss: the non-identifiability inherent in the standard softmax cross-entropy distillation loss. To address this issue, we propose two complementary strategies: (1) adopting an imbalance-invariant distillation loss to mitigate the adverse effect of imbalanced weights during distillation, and (2) regularizing the original prediction/distillation loss with shift-sensitive alternatives, which render the optimization problem identifiable and proactively prevent imbalance from arising. These strategies form the foundation of five novel approaches that can be seamlessly integrated into existing distillation-based incremental learning frameworks such as LWF, LWM, and LUCIR. We validate the effectiveness of our approaches through extensive numerical experiments, demonstrating consistent improvements in predictive accuracy and substantial reductions in forgetting. For example, in a 10-task incremental learning setting on CIFAR-100, our methods improve the average accuracy of three widely used approaches - LWF, LWM, and LUCIR - by 11.8 %, 11.5 %, and 12.8 %, respectively, while reducing their average forgetting rates by 16.5 %, 16.8 %, and 13.8 %, respectively. Our code is publicly available at https://github.com/nexais/RethinkSoftmax.
Citation
Z. Zhai et al., “Rethinking softmax in incremental learning,” Neural Networks, vol. 193, p. 108017, Jan. 2026, doi: 10.1016/J.NEUNET.2025.108017
Source
Neural Networks
Conference
Keywords
Catastrophic Forgetting, Continual Learning, Distillation Loss, Incremental Learning, Life-long Learning, Entropy, Adverse Effect, Catastrophic Forgetting, Continual Learning, Cross Entropy, Distillation Loss, Entropy Loss, Identifiability, Incremental Learning, Life Long Learning, Optimization Problems, Distillation
Subjects
Source
Publisher
Elsevier
Full-text link