Going Beyond Behaviour Cloning With Off-Policy Reinforcement Learning

Abstract

In the first part of the talk we will introduce the common terms used in standard online RL. After that we will define the offline RL setting, describing applications and benchmarks. We will then focus on behavioural cloning (BC), as a simple and stable baseline for learning a policy from offline interaction data. As a particular instance of BC, we will describe the decision transformer, a recently proposed method that leverages the transformer architecture to tackle the offline RL setting. In the second part of the talk, we will explore how off-policy RL algorithms originally designed for the online setting (such as SAC) can be adapted to better handle the necessary distribution shift required for improving on the policy in the offline data, without online feedback. We will find that this reduces to a problem of quantifying and managing uncertainty. In the third and last part of the talk, we will first review the classical offline reinforcement learning methods, including ways to evaluate and improve policies using offline data by importance sampling. The challenges and applicability of these methods will be discussed. Then, we will review modern offline RL methods, including policy constraint methods and model-based offline RL methods. In policy constraint methods, we encourage the new policy to be similar to the policy observed in the offline dataset, while in model-based offline RL methods, we quantify the uncertainty of the model and use the uncertainty to discourage the new policy from visiting those uncertain regions.

Date
Feb 15, 2023
Location
Online
Adam Jelley
Adam Jelley
PhD Student in Deep Reinforcement Learning

PhD Student in efficient reinforcement learning.