Atari Environments
Overview
Atari 2600 is a video game console from Atari that was released in 1977. The game console included popular games such as Breakout, Ms. Pacman and Space Invaders. Since Deep Q-Networks were introduced by Mnih et al. in 2013, Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games.
The Atari 2600 environments was originally provided through the Arcade Learning Environment (ALE). The environments have been wrapped by OpenAI Gym to create a more standardized interface. The OpenAI Gym provides 59 Atari 2600 games as environments.
State of the Art
Note: Most papers use 57 Atari 2600 games, and a couple of them are not supported by OpenAI Gym.
These are the published state-of-the-art results for Atari 2600 testbed. To test the robustness of the agent, most papers use one or both settings: the no-op starts and the human starts, both devised to provide a nondeterministic starting position. In No-op start setting, the agent selects the “do nothing” action for up to 30 times at the start of an episode. providing random starting positions to the agent. This originates from the DQN2015 paper by Mnih et al. (2015). In the human start setting, the agents start from one of the 100 starting points sampled from a human professional’s gameplay. The human starts setting originates from the GorilaDQN paper by Nair et al. (2015).
Median
One popular method of checking the agent’s overall performance is the median human-normalized score. You can read more about the choice of this metric in the Rainbow paper. For better comparison of algorithms, we only used results that were tested on majority of the games available.
No-op starts
1 Ape-X DQN used a lot more (x100) environment frames compared to other results. The training time is half the time of other DQN results.
2 Hyperparameters were tuned per game.
3 Only evaluated on 49 games.
Human starts
Median | Method | Score from |
---|---|---|
358% | Ape-X DQN1 | Distributed Prioritized Experience Replay |
250% | UNREAL2 | Distributed Prioritized Experience Replay |
153% | Rainbow DQN | Rainbow: Combining Improvements in Deep Reinforcement Learning |
128% | Prioritized DDQN | Dueling Network Architectures for Deep Reinforcement Learning |
125% | C51 | Distributed Prioritized Experience Replay |
125% | Distributional DQN | Rainbow: Combining Improvements in Deep Reinforcement Learning |
117% | Dueling DDQN | Dueling Network Architectures for Deep Reinforcement Learning |
116% | A3C | Dueling Network Architectures for Deep Reinforcement Learning |
110% | DDQN | Dueling Network Architectures for Deep Reinforcement Learning |
102% | Noisy DQN | Rainbow: Combining Improvements in Deep Reinforcement Learning |
78% | Gorila DQN | Distributed Prioritized Experience Replay |
68% | DQN3 | Rainbow: Combining Improvements in Deep Reinforcement Learning |
1 Ape-X DQN used a lot more (x100) environment frames compared to other results. The training time is half the time of other DQN results.
2 Hyperparameters were tuned per game.
3 Only evaluated on 49 games.
Individual Environments
Although the metric above is a valuable way of comparing the general effectiveness of an algorithm, different algorithms have different strengths. Thus, we also included the state-of-the-art results for each game.
If you want to see how other methods performed in each Atari 2600 games, you can check the results of all methods by clicking the name of the game in the table below.
No-op Starts
Human Starts
Installation
Prerequisites
To install the Atari 2600 environment, you need the OpenAI Gym toolkit. Read this page to learn how to install OpenAI Gym.
Installation via pip
If you did a full install of OpenAI Gym, the Atari 2600 should already be installed. Otherwise, you can install the Atari 2600 environment with a single pip
command:
1
pip3 install gym[atari]
Test Installation
You can run a simple random agent to make sure the Atari 2600 environment was correctly installed.
1
2
3
4
5
6
7
import gym
env = gym.make('Pong-v0')
done = False
while not done:
_, _, done, _ = env.step(env.action_space.sample())
env.render()
env.close()
Variants
In OpenAI Gym, each game has a few variants, distinguished by their suffixes. Through these variants, you can configure frame skipping and sticky actions. Frame skipping is a technique of using $k$-th frame. In other words, the agent only makes action every $k$ frames, and the same action is performed for $k$ frames. Sticky actions is a technique of setting some nonzero probability $p$ of action being repeated without agent’s control. This adds stochasticity to the deterministic Atari 2600 environments.
For example, there are six variants for the Pong environment.
Name | Frame Skip $k$ | Repeat action probability $p$ |
---|---|---|
Pong-v0 |
2~41 | 0.25 |
Pong-v4 |
2~41 | 0 |
PongDeterministic-v0 |
4 2 | 0.25 |
PongDeterministic-v4 3 |
4 2 | 0 |
PongNoFrameskip-v0 |
1 | 0.25 |
PongNoFrameskip-v4 |
1 | 0 |
1 $k$ is chosen randomly at every step from values ${2, 3, 4}$.
2 For Space Invaders, the Deterministic
variant $k=3$. This is because when $k=4$, the lasers are invisible because it frame skip coincides with the blinking frequency of lasers.
3 Deterministic-v4
is the configuration used to assess Deep Q-Networks.
For more details about frame skipping and sticky actions, check Sections 2 and 5 of the ALE whitepaper: Revisiting the Arcade Learning Environment.
Also, there are RAM environments such as Pong-ram-v0
, where the observation is the RAM of the Atari machine instead of the 210 x 160 visual input. You can also add suffixes to RAM environments.