In this issue, we look at two papers combating catastrophic interference. Memento combats interference by training two independent agents where the second agent takes off...
In this issue, we look at using intrinsic rewards to encourage cooperation in two-agent MDP. We also look at replacing maximization in Q-learning over all...
In this issue, we look at Google and MIT's study on the observational overfitting phenomenon and how overparametrization helps generalization, a new family of algorithms...
In this issue, we look at MuZero, DeepMind's new algorithm that learns a model and achieves AlphaZero performance in Chess, Shogi, and Go and achieves...
In this issue, we look at an algorithm that use sibling trajectories to escape local optimas in distance-based shaped rewards, and an algorithm that dynamically...
In this issue, we look at a robot hand manipulating and "solving" the Rubik's Cube. We also look at comparative performances of human-agnostic and human-aware...
In this issue, we look at Action Grammar RL, a hierarchical RL framework that adds new macro-actions, improving performance of DDQN and SAC in Atari...
In this issue, we look at LASER, DeepMind's improvement to V-trace that achieves state-of-the-art sample efficiency in Atari environments. We also look at Google AI...
In this issue, we look at OpenAI's work on multi-agent hide and seek and the behaviors that emerge. We also look at Mila's population-based exploration...
In this issue, we look at reinforcement learning from a wider perspective. We look at new environments and experiments that are designed to test and...
This week, we first look at Free-Lunch Saliency, a built-in interpretability module that does not deteriorate performance. Then, we look at HRL-BC, a combination of...
This week, we look at a self imitation learning method that imitates diverse past experience for better exploration. We also summarize an environment probing policy...
This week, we summarize a new transfer learning method using the Transformer reward model, and a world model controller that does not require training the...
In this issue, we focus on replacing inductive bias with adaptive solutions (DeepMind), learning off-policy from expert experience (Google Brain), and learning a shared model...
This week, we summarize two benchmark papers. The first paper benchmarks 11 model-based RL algorithms in 18 continuous control environments, and the second paper benchmarks...
This week, we first introduce a ensemble of primitives without a high-level meta-policy that can make decentralized decisions. We then look at an deep learning...
This week, we first look at ST-DIM, an unsupervised state representation learning method from MILA and Microsoft Research. We also check UC Berkeley's new policy...
This week, we introduce three papers on replay-based RL and model-based RL. The first paper introduces SoRB, a way to combine experience replay and planning....
This week, we introduce MineRL, a new RL competition using human priors to solve Minecraft. We also introduce OPE, a method of off-policy evaluation through...
This week, we introduce a survey of Domain Randomization Techniques for Sim-to-Real Transfer and ToyBox, a suite of redesigned Atari Environments for experimental evaluation of...
In this issue, we summarize the use of information asymmetry in KL regularized objective to regularize the policy, the challenges of deploying deep RL into...
In this issue, we introduce 'ray interference,' a possible cause of performance plateaus in deep reinforcement learning conjectured by Google DeepMind. We also introduce a...
In this issue, we introduce VICE-RAQ by UC Berkeley and T-REX by UT Austin and Preferred Networks. VICE-RAQ trains a classifier to infer rewards from...
In this issue, we look at Google Brain's algorithm of learning by playing, DeepMind's thoughts on multi-agent intelligence, and DeepMind's new navigation environment using Google...
In this issue, we look at SimPLe, a model-based RL algorithm that achieves near-state-of-the-art results on Arcade Learning Environments (ALE). We also look at Neural...
This week, we introduce the Obstacle Tower Challenge, a new RL competition by Unity, Hanabi Learning Environment, a multi-agent environment by DeepMind, and Spinning Up...
This week, we look at AlphaStar, a Starcraft II AI, PSRO_rN, an evaluation algorithm encouraging diverse population of well-trained agents, and a novel Meta-RL approach...
In this issue, we introduce new curriculum learning algorithm by Uber AI Labs, model-free planning algorithm by DeepMind, and optical-flow based control algorithm by Intel...
In this issue, we introduce the DeepTraffic competition from Lex Fridman's MIT Deep Learning for Self-Driving Cars course. We also review a new paper on...
In this issue, we discuss hyperparameter tuning for AlphaGo from DeepMind, Hierarchical RL model for a MOBA game from Tencent, and GAN-based Sim-to-Real algorithm from...
In this inaugural issue of the RL Weekly newsletter, we discuss Soft Actor-Critic (SAC) from BAIR, the new TextWorld competition by Microsoft Research, and AsDDPG...
Never miss an issue of RL Weekly from us, subscribe to our newsletter