Skip to content

datake/Papers-Of-Continual-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 

Repository files navigation

Papers-Continual-RL

I am constantly collecting papers on Continual Reinforcement Learning published on high-profile ML conferences or journals. Although there are too many challenges in continual RL, e.g., less formal definition and incomplete foundations, it is still a valuable research field that helps to address the challenges in the non-stationary environments in RL. Please feel free to let us know if you feel we have missed some important papers.

Contact : Ke Sun, ksun6@ualberta.ca

2026

Inspired by the interplay between the hippocampus and cerebral cortex in human learning and memory system, this study proposes a dual-learner framework comprising a fast learner and a meta learner, which are coupled to perform distinct yet complementary roles. The fast learner focuses on knowledge transfer, while the meta learner ensures knowledge integration.

2025

This paper tries to redefine RL from the perspective of continual learning. It investigates the four foundations of RL and provides their alternatives to define RL in the general continual learning or non-Markovian environment.

This paper leverages vector quantization to align the different state and action spaces of various tasks by selective weight activations. The experiments are on CW10, Mujoco, and D4RL.

This paper uses VAE to continually learn a generative model by freezing the previous one to replay past experience. At the same time, to facilitate plasticity, a new learner is encouraged to explore the new state and action pairs beyond the state-action scope of the generative/world model, i.e., the past knowledge, leading to exploration of new knowledge in the new environment.

This paper learns an online world model and act by planning via model prediction control to construct a unified world dynamics to handle the catastrophic forgetting issue. An online agent is also developed to evaluate on a proposed continual bench environment.

This paper follows Fast TRAC (NeurIPS 2024), focusing on plasticity loss with mostly same experimental settup: gym control, procgen, and MinAtar. The novelty of this paper is that it establishes the connection between plasticity and the churn via NTK matrix.

This position paper emphasizes the importance of evaluation of the algorithm's performance in the context of continual RL. They propose k-percent turning to promote a fairer evaluation, called lifetime tuning.

This paper highlights the prevalence of negative transfer within the loss of plasticity (even for fine-tune algorithms without considering the catastrophic forgetting) through extensive experiments on Metaworld, DMC control, and Atar games. To address this issue, the dual actor networks are used, with one periodically resetting the learn the current task, and the other distilling all knowledge from the large replay buffer via behavior cloning.

2024

The authors bring the idea from parameter-free online convex optimization to design a new advanced optimizer (based on SGD, Adam) to address the loss of plasiticity issue in continual RL. Experiments include Procgen, Atari, and Gym Control environments, which are more flexible.

This paper focuses on the loss of plasticity of continual RL from the perspective of optimization, one of the important aspects of continual RL. The authors propose to use Parseval regularization, which maintains orthogonality of weight matrix and thus enhances the optimization in the presence of new tasks.

Like PackNet, this paper designs a growing NN that uses the attention module to integrate the output from previous policies and the current policy. The growing NN is thus deployed for the new task and empirically achieves great performance in terms of bot plasticity (forward transfer and learning curve) and catastrophic forgetting (average performance) in Metaworld and Atari games.

This paper empirically studies the loss of plasticity issue from supervised learning to reinforcement learning and proposes the continual propogation algorithm to mitigate the loss of plasticity issue partially.

2023

This paper incorporates normalized Q function and policy distillation (KL regularization) in SAC to address the reward scaling and catastrophic forgetting issues in continual RL. Experiments include several common baselines and are conducted in Continual World.

This paper incrementally expands the subspace of policies to balance the agent's size and performance in continual RL. Although the result is very reasonable, it may suffer from the high computation, and increasing the agent's size may not be admissible in practice.

The authors are the first to provide a conceptual definition of continual RL, but unfortunately, no practical algorithms are proposed.

This paper studies the value-based continual RL in both prediction and control settings, and proposes to decompose the value function into two components that are updated in different time-scaling, which mirror the function of neocortex and hippocampus

Benchmark on Images: Tensorflow

The paper proposes sparse prompting to address continual RL problems, which learns overcomplete dictionaries to produce sparse masks as prompts, extracting a sub-network for each task from a meta-policy network.

The paper considers continual RL from the perspective of POMDP, and it also involves some insightful discussions.

This paper studies the continual learning of value-based RL in Atari games, (1) standard setting: 10 games * 20M * repeat 5times (2) milder setting: mode change within one game. Then they find the loss of plasticity and the catastrophic interference often occur with the weight diminishing. Therefore, they propose to use concatenated ReLu (ReLu(x), ReLu(-x)) to help propagate the gradient to reduce the loss of plasticity.

2022 and Before

This paper studies continual RL from the perspective of meta-learning.

As an early attempt, this paper focuses on the loss of plasticity only while assuming the previous data are stored and the previous reward functions are given. However, the interesting idea is that the importance weight is introduced, estimated by a learned classifier and in the progressively increasing replay buffer. Experiments are on a suite of simulated robotics environments.

This paper empirically investigates the impact of different components in SAC on continual RL, and proposes the behavior cloning method to combine improvements.

This paper investigates the adaptive power of model-based RL to local reward changes and reveals 4 failure modes. It finds that a large replay buffer from old data hurts the adaptivity/plasticity, while a small one tends to lead to catastraphic forgetting, suggesting a trade-off between them to achieve more ambitious continual RL problems.

Benchmark: Continual World, 10 manipulation tasks from MetaWorld

The first review of continual RL, however, most of the related papers are about continual learning instead of RL.

The authors proposed to use the policy consolidation method, in which the policy network interacts with a series of hidden networks in different time-scales to mitigate catastrophic forgetting.

This paper incorporates a synaptic model in RL agents to mitigate catastrophic forgetting in continual RL. This study is inspired by neuroscience, but its experiments are restricted on tabular experiments.

This paper brings state abstraction to lifelong RL and provides a theoretical analysis.

Remark. There are also a few papers before 2022 collected in https://github.com/ContinualAI/continual-learning-papers#continual-reinforcement-learning.

About

Related papers for Continual Reinforcement Learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors