Item

Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Huang Wei
Chen Wuyang
Xu Zhiqiang
Wang Zhangyang
Suzuki Taiji
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Deep neural networks exhibit rich training dynamics under gradient descent updates. The root of this phenomenon is the non-convex optimization of deep neural networks, which is extensively studied in recent theory works. However, previous works did not consider or only considered a few gradient descent steps under non-asymptotic manner, resulting in an incomplete characterization of the network’s stage-wise learning behavior and the evolutionary trajectory of its parameters and outputs. In this work, we characterize how a network’s feature learning happens during training in a regression setting. We analyze the dynamics of two quantities of a two-layer linear network: the projection of the first layer’s weights onto the feature vector, and the weights in the second layer. The former indicates how well the network fits the feature vector from the input data, and the latter stands for the magnitude learned by the network. More importantly, by formulating the dynamics of these two quantities into a non-linear system, we give the precise characterization of the training trajectory, demonstrating the rich feature learning dynamics in the linear neural network. Moreover, we establish a connection between the feature learning dynamics and the neural tangent kernel, illustrating the presence of feature learning beyond lazy training. Experimental simulations corroborate our theoretical findings, confirming the validity of our proposed conclusion.
Citation
W. Huang, W. Chen, Z. Xu, Z. Wang, and T. Suzuki, “Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks,” in Proc. 2nd Conf. Parsimony and Learning (CPAL), Stanford, CA, USA, Mar. 24–27, 2025, Proc. Mach. Learn. Res., vol. 280, pp. 1087–1111.
Source
Proceedings of Machine Learning Research
Conference
2nd Conference on Parsimony and Learning, CPAL 2025
Keywords
Deep Neural Networks, Dynamics, Federated Learning, Gradient Methods, Learning Systems, Linear Systems, Network Layers, Feature Learning, Features Vector, Gradient-descent, Learning Behavior, Neural-networks, Non-asymptotic, Nonconvex Optimization, Rich Features, Second Layer, Two-layer, Linear Networks
Subjects
Source
2nd Conference on Parsimony and Learning, CPAL 2025
Publisher
ML Research Press
DOI
Full-text link