Loading...
Thumbnail Image
Item

Transfer Q-learning for finite-horizon Markov decision processes

Chen, Elynn
Li, Sai
Jordan, Michael I.
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online Q-learning, integrating valuable insights from offline source studies. The proposed transfer Q-learning algorithm contains a novel re-targeting step that enables cross-stage transfer along multiple stages in an RL task, besides the usual cross-task transfer for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the Q∗-function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under stage-wise reward similarity and mild design similarity across tasks. Empirical evidence from both synthetic and real datasets is presented to evaluate the proposed algorithm and support our theoretical results.
Citation
E. Chen, S. Li, and M. I. Jordan, “Transfer Q-learning for finite-horizon Markov decision processes,” https://doi.org/10.1214/25-EJS2459, vol. 19, no. 2, pp. 5289–5312, Jan. 2025, doi: 10.1214/25-EJS2459
Source
Electronic Journal of Statistics
Conference
Keywords
Backward Inductive Q-Learning, Dynamic Treatment Regimes, High-Dimensional Estimation, Time-Inhomogeneous Finite-Horizon Markov Decision Process, Transfer Learning
Subjects
Source
Publisher
Institute of Mathematical Statistics
Full-text link