In model-free multi-task reinforcement learning (RL), abundant work shows that a shared policy network can improve performance across the different tasks. The rationale behind this is that an agent can learn similarities that all tasks have in common and thus effectively enrich the sample count for all tasks at hand. In model-based multi-task RL however, we found evidence suggesting that a dynamics model can suffer from task confusion or catastrophic interference if it is trained on multiple tasks at once.
To mitigate this problem, we use a set of distinct dynamics models which are orchestrated by a task classification network. The latter is trained to detect the task at hand and effectively picks one of the dynamics models for planning. Once a dynamics model is chosen, we use it to generate imaginary rolluts to determine the agent's next actions in the real environment. By separating the different task dynamics in different networks, we avoid catastrophic interference to a large degree, which leads to higher quality and more stable rollouts for our proposed approach. When using only a single dynamics model for all tasks, we found clear evidence of task confusion. The dynamics model was prone to randomly switching tasks in the middle of a rollout, which rendered the rollout unusable for planning. We conjecture that in contrast to learning a shared policy, for learning the task dynamics it is more important to model the details of the individual tasks correctly, since this is where they differ from each other.
Modular Networks Prevent Catastrophic Interference in Model-Based Multi-task Reinforcement Learning