RL Weekly 8: World Discovery Models, MuJoCo Soccer Environment, and Deep Planning Network

Published Feb 26, 2019 by Seungjae Ryan Lee

World Discovery Models

What it is

DeepMind introduced a new agent that incorporates Neural Differential Information Gain Optimisation (NDIGO) to build an accurate world model. NDIGO measures novelty through how helpful a new observation was in predicting future observations. The authors show that NDIGO outperforms state of the art methods in terms of the quality of the learned representation.

Why it matters

Although reinforcement learning has achieved remarkable success, it is largely restricted to environments where a good external reward is provided. There have been various efforts to encourage the agent to explore and discover. However, these methods tend to have the “noisy-TV problem,” where the agent mistakes random patterns in the world as novel observations (For more details, read the papers in External Resources). Thus, most existing algorithms do not generalize well to stochastic and partially observable environments. The authors show that NDIGO can handle such environments.

Read more

World Discovery Models (ArXiv Preprint)

External Resources

MuJoCo Soccer Environment

What it is

Google DeepMind open-sourced a new multi-agent environment (2-vs-2) where agents play soccer on a simulated physics environment. The authors also provide a baseline that uses population-based training, reward shaping, policy recurrence, and decomposed action-value function.

Why it matters

The traditional reinforcement learning benchmark has been the Atari 2600 games with the Arcade Learning Environment. However, as of 2019, it is safe to say that the benchmark has completed its main purpose, as reinforcement learning agents have achieved superhuman scores on almost all games. Thus, a more challenging testbed is now needed to accelerate progress. DeepMind seems to be interested on multiagent reinforcement learning environments, open sourcing Capture the Flag, Hanabi, and Soccer environment.

Read more

External Resources

PlaNet: A Deep Planning Network for RL

What it is

The Deep Planning Network (PlaNet) agent is a model-based agent that learns a latent dynamics model. PlaNet trained on 2K episodes outperforms model-free A3C trained on 100K episodes on all tasks and have similar performance to D4PG trained on 100K episodes.

Read more

Here are some other exciting news in RL:

Google Brain achieved new state of the art on weakly-supervised semantic parsing with Meta Reward Learning (MeRL).
University of Oxford devised a new gradient-free meta-learning method for hyperparameter tuning called Hyperparamter Optimisation On the Fly (HOOF).

endtoend.ai

RL Weekly 8: World Discovery Models, MuJoCo Soccer Environment, and Deep Planning Network

Subscribe to RL Weekly

World Discovery Models

MuJoCo Soccer Environment

PlaNet: A Deep Planning Network for RL

Related Posts

Explore →

endtoend.ai

RL Weekly 8: World Discovery Models, MuJoCo Soccer Environment, and Deep Planning Network

Subscribe to RL Weekly

World Discovery Models

MuJoCo Soccer Environment

PlaNet: A Deep Planning Network for RL

Related Posts

RL Weekly 7: Obstacle Tower Challenge, Hanabi Learning Environment, and Spinning Up Workshop

RL Weekly 9: Sample-efficient Near-SOTA Model-based RL, Neural MMO, and Bottlenecks in Deep Q-Learning

Explore →