This week, we look at a self imitation learning method that imitates diverse past experience for better exploration. We also summarize an environment probing policy...
This week, we summarize a new transfer learning method using the Transformer reward model, and a world model controller that does not require training the...
In this issue, we focus on replacing inductive bias with adaptive solutions (DeepMind), learning off-policy from expert experience (Google Brain), and learning a shared model...
This week, we summarize two benchmark papers. The first paper benchmarks 11 model-based RL algorithms in 18 continuous control environments, and the second paper benchmarks...
This week, we first introduce a ensemble of primitives without a high-level meta-policy that can make decentralized decisions. We then look at an deep learning...