I am constantly collecting papers on Continual Reinforcement Learning published on high-profile ML conferences or journals. Although there are too many challenges in continual RL, e.g., less formal definition and incomplete foundations, it is still a valuable research field that helps to address the challenges in the non-stationary environments in RL. Please feel free to let us know if you feel we have missed some important papers.
Contact : Ke Sun, ksun6@ualberta.ca
Inspired by the interplay between the hippocampus and cerebral cortex in human learning and memory system, this study proposes a dual-learner framework comprising a fast learner and a meta learner, which are coupled to perform distinct yet complementary roles. The fast learner focuses on knowledge transfer, while the meta learner ensures knowledge integration.
This paper tries to redefine RL from the perspective of continual learning. It investigates the four foundations of RL and provides their alternatives to define RL in the general continual learning or non-Markovian environment.
-
Continual Knowledge Adaptation for Reinforcement Learning (NeurIPS 2025)
-
Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces (NeurIPS 2025)
This paper leverages vector quantization to align the different state and action spaces of various tasks by selective weight activations. The experiments are on CW10, Mujoco, and D4RL.
This paper uses VAE to continually learn a generative model by freezing the previous one to replay past experience. At the same time, to facilitate plasticity, a new learner is encouraged to explore the new state and action pairs beyond the state-action scope of the generative/world model, i.e., the past knowledge, leading to exploration of new knowledge in the new environment.
This paper learns an online world model and act by planning via model prediction control to construct a unified world dynamics to handle the catastrophic forgetting issue. An online agent is also developed to evaluate on a proposed continual bench environment.
This paper follows Fast TRAC (NeurIPS 2024), focusing on plasticity loss with mostly same experimental settup: gym control, procgen, and MinAtar. The novelty of this paper is that it establishes the connection between plasticity and the churn via NTK matrix.
This position paper emphasizes the importance of evaluation of the algorithm's performance in the context of continual RL. They propose k-percent turning to promote a fairer evaluation, called lifetime tuning.
- Prevalence of Negative Transfer in Continual Reinforcement Learning: Analyses and a Simple Baseline (ICLR 2025)
This paper highlights the prevalence of negative transfer within the loss of plasticity (even for fine-tune algorithms without considering the catastrophic forgetting) through extensive experiments on Metaworld, DMC control, and Atar games. To address this issue, the dual actor networks are used, with one periodically resetting the learn the current task, and the other distilling all knowledge from the large replay buffer via behavior cloning.
The authors bring the idea from parameter-free online convex optimization to design a new advanced optimizer (based on SGD, Adam) to address the loss of plasiticity issue in continual RL. Experiments include Procgen, Atari, and Gym Control environments, which are more flexible.
- Parseval Regularization for Continual Reinforcement Learning (NeurIPS 2024)
This paper focuses on the loss of plasticity of continual RL from the perspective of optimization, one of the important aspects of continual RL. The authors propose to use Parseval regularization, which maintains orthogonality of weight matrix and thus enhances the optimization in the presence of new tasks.
Like PackNet, this paper designs a growing NN that uses the attention module to integrate the output from previous policies and the current policy. The growing NN is thus deployed for the new task and empirically achieves great performance in terms of bot plasticity (forward transfer and learning curve) and catastrophic forgetting (average performance) in Metaworld and Atari games.
- Loss of plasticity in deep continual learning (Nature 2024) 🔥
This paper empirically studies the loss of plasticity issue from supervised learning to reinforcement learning and proposes the continual propogation algorithm to mitigate the loss of plasticity issue partially.
- Replay-enhanced Continual Reinforcement Learning (TMLR 2023)
This paper incorporates normalized Q function and policy distillation (KL regularization) in SAC to address the reward scaling and catastrophic forgetting issues in continual RL. Experiments include several common baselines and are conducted in Continual World.
This paper incrementally expands the subspace of policies to balance the agent's size and performance in continual RL. Although the result is very reasonable, it may suffer from the high computation, and increasing the agent's size may not be admissible in practice.
- A Definition of Continual Reinforcement Learning (NeurIPS 2023)
The authors are the first to provide a conceptual definition of continual RL, but unfortunately, no practical algorithms are proposed.
- Prediction and Control in Continual Reinforcement Learning (NeurIPS 2023) 🔥
This paper studies the value-based continual RL in both prediction and control settings, and proposes to decompose the value function into two components that are updated in different time-scaling, which mirror the function of neocortex and hippocampus
- COOM: A Game Benchmark for Continual Reinforcement Learning (NeurIPS 2023)
Benchmark on Images: Tensorflow
The paper proposes sparse prompting to address continual RL problems, which learns overcomplete dictionaries to produce sparse masks as prompts, extracting a sub-network for each task from a meta-policy network.
- Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges (CoLLAs 2023)
The paper considers continual RL from the perspective of POMDP, and it also involves some insightful discussions.
- Loss of Plasticity in Continual Deep Reinforcement Learning (CoLLAs 2023) 🔥
This paper studies the continual learning of value-based RL in Atari games, (1) standard setting: 10 games * 20M * repeat 5times (2) milder setting: mode change within one game. Then they find the loss of plasticity and the catastrophic interference often occur with the weight diminishing. Therefore, they propose to use concatenated ReLu (ReLu(x), ReLu(-x)) to help propagate the gradient to reduce the loss of plasticity.
- CoMPS: Continual Meta Policy Search (ICLR 2022)
This paper studies continual RL from the perspective of meta-learning.
As an early attempt, this paper focuses on the loss of plasticity only while assuming the previous data are stored and the previous reward functions are given. However, the interesting idea is that the importance weight is introduced, estimated by a learned classifier and in the progressively increasing replay buffer. Experiments are on a suite of simulated robotics environments.
- Disentangling Transfer in Continual Reinforcement Learning (NeurIPS 2022)
This paper empirically investigates the impact of different components in SAC on continual RL, and proposes the behavior cloning method to combine improvements.
This paper investigates the adaptive power of model-based RL to local reward changes and reveals 4 failure modes. It finds that a large replay buffer from old data hurts the adaptivity/plasticity, while a small one tends to lead to catastraphic forgetting, suggesting a trade-off between them to achieve more ambitious continual RL problems.
Benchmark: Continual World, 10 manipulation tasks from MetaWorld
- Towards Continual Reinforcement Learning: A Review and Perspectives (Journal of Artificial Intelligence Research (JAIR)) 2020
The first review of continual RL, however, most of the related papers are about continual learning instead of RL.
The authors proposed to use the policy consolidation method, in which the policy network interacts with a series of hidden networks in different time-scales to mitigate catastrophic forgetting.
This paper incorporates a synaptic model in RL agents to mitigate catastrophic forgetting in continual RL. This study is inspired by neuroscience, but its experiments are restricted on tabular experiments.
This paper brings state abstraction to lifelong RL and provides a theoretical analysis.
Remark. There are also a few papers before 2022 collected in https://github.com/ContinualAI/continual-learning-papers#continual-reinforcement-learning.