Item

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Labbi, Safwan
Tiapkin, Daniil
Mancini, Lorenzo
Mangold, Paul
Moulines, Eric
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm (Fed-UCBVI), a novel extension of the UCBVI algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of Fed-UCBVI scales as Õ(√H3|S
A|T/M), with a small additional term due to heterogeneity, where |S| is the number of states, |A| is the number of actions, H is the episode length, M is the number of agents, and T is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, Fed-UCBVI has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, Fed-UCBVI's communication complexity only marginally increases with the number of agents.
Citation
S. Labbi, D. Tiapkin, L. Mancini, P. Mangold, and E. Moulines, “Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents,” in Proc. 28th Int. Conf. Artif. Intell. Stat. (AISTATS 2025), vol. 258, Proc. Mach. Learn. Res., Mai Khao, Thailand, May 3–5, 2025, pp. 1315–1323.
Source
Proceedings of Machine Learning Research
Conference
28th International Conference on Artificial Intelligence and Statistics, AISTATS 2025
Keywords
Agents, Computational Methods, Federated Learning, Intelligent Agents, Iterative Methods, Optimization, Heterogeneous Agents, Learning Frameworks, Low Bound, Minimax, Number Of State, Regret Minimization, Single-agent, Upper Bound, Upper Confidence Bound, Value Iteration Algorithm, Multi Agent Systems
Subjects
Source
28th International Conference on Artificial Intelligence and Statistics, AISTATS 2025
Publisher
ML Research Press
DOI
Full-text link