Curriculum learning approach for off-policy deep reinforcement learning algorithms
Cantero, Mario
Cantero, Mario
Author
Supervisor
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Deep Reinforcement Learning (DRL) is a method that helps to train agents to make decisions in complex environments. However, it can be difficult to ensure that DRL algorithms are efficient and stable, particularly when dealing with high-dimensional state spaces and sparse rewards. In recent years, there has been an increase in the use of curriculum learning as a promising technique to address these challenges. This thesis presents a novel curriculum learning approach tailored for off-policy deep reinforcement learning (DRL) algorithms. The research focuses on enhancing the final performance and learning stability of DRL agents without altering the environment or the tasks. The proposed method strategically utilizes the experience replay buffer, ordering experiences based on the temporal difference error (TDE) to create a curriculum. This approach allows agents to learn from simpler to more complex experiences, potentially accelerating the learning process and improving performance. Experiments were conducted using two popular environments, Lunar Lander and Bipedal Walker, with established algorithms DQN and SAC. The results indicate that the curriculum approach significantly improves the evaluation average reward and reduces the time to reach the task s goal compared to baseline methods for the DQN. However, no significant improvements were shown using the curriculum learning approach with the SAC algorithm. This study s findings suggest that TDE can serve as an efficient indicator of task difficulty, and employing a curriculum can lead to more effective learning in DRL systems. The research contributes to the field by offering an alternative curriculum learning method that enhances the practicality of DRL in real-world applications where data collection is expensive or risky.
Citation
M. Cantero, "Curriculum learning approach for off-policy deep reinforcement learning algorithms", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
