Loading...
Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning
Samsonov, Sergey ; Moulines, Eric ; Shao, Qi-Man ; Zhang, Zhuo-Song ; Naumov, Alexey
Samsonov, Sergey
Moulines, Eric
Shao, Qi-Man
Zhang, Zhuo-Song
Naumov, Alexey
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
In this paper, we obtain the Berry–Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.
Citation
S. Samsonov, E. Moulines Ecole Polytechnique, M. Qi-Man Shao, Z.-S. Zhang, and A. Naumov, “Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning,” Adv Neural Inf Process Syst, vol. 37, pp. 12408–12460, Dec. 2024.
Source
Advances in Neural Information Processing Systems (NeurIPS 2024)
Conference
Keywords
Gaussian approximation, Multiplier bootstrap, Polyak-Ruppert averaging, Linear stochastic approximation, Temporal difference learning
Subjects
Source
Publisher
NEURIPS
