Atari Environments

Overview

Atari 2600 is a video game console from Atari that was released in 1977. The game console included popular games such as Breakout, Ms. Pacman and Space Invaders. Since Deep Q-Networks were introduced by Mnih et al. in 2013, Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games.

The Atari 2600 environments was originally provided through the Arcade Learning Environment (ALE). The environments have been wrapped by OpenAI Gym to create a more standardized interface. The OpenAI Gym provides 59 Atari 2600 games as environments.

State of the Art

Note: Most papers use 57 Atari 2600 games, and a couple of them are not supported by OpenAI Gym.

These are the published state-of-the-art results for Atari 2600 testbed. To test the robustness of the agent, most papers use one or both settings: the no-op starts and the human starts, both devised to provide a nondeterministic starting position. In No-op start setting, the agent selects the “do nothing” action for up to 30 times at the start of an episode. providing random starting positions to the agent. This originates from the DQN2015 paper by Mnih et al. (2015). In the human start setting, the agents start from one of the 100 starting points sampled from a human professional’s gameplay. The human starts setting originates from the GorilaDQN paper by Nair et al. (2015).

Median

One popular method of checking the agent’s overall performance is the median human-normalized score. You can read more about the choice of this metric in the Rainbow paper. For better comparison of algorithms, we only used results that were tested on majority of the games available.

No-op starts

Median	Method	Score from
434%	Ape-X DQN¹	Distributed Prioritized Experience Replay
331%	UNREAL²	Distributed Prioritized Experience Replay
223%	Rainbow DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
178%	C51	A Distributional Perspective on Reinforcement Learning
172%	NoisyNet-Dueling DDQN	Noisy Networks for Exploration
164%	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
151%	Dueling DDQN	Dueling Network Architectures for Deep Reinforcement Learning
140%	Prioritized DDQN	Dueling Network Architectures for Deep Reinforcement Learning
132%	Dueling DDQN	Noisy Networks for Exploration
123%	NoisyNet-DQN	Noisy Networks for Exploration
118%	NoisyNet-DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
117%	DDQN	Dueling Network Architectures for Deep Reinforcement Learning
96%	Gorila DQN	Distributed Prioritized Experience Replay
83%	DQN³	Noisy Networks for Exploration
80%	A3C	Noisy Networks for Exploration
79%	DQN³	A Distributional Perspective on Reinforcement Learning

_{¹ Ape-X DQN used a lot more (x100) environment frames compared to other results. The training time is half the time of other DQN results.

² Hyperparameters were tuned per game.

³ Only evaluated on 49 games.}

Human starts

Median	Method	Score from
358%	Ape-X DQN¹	Distributed Prioritized Experience Replay
250%	UNREAL²	Distributed Prioritized Experience Replay
153%	Rainbow DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
128%	Prioritized DDQN	Dueling Network Architectures for Deep Reinforcement Learning
125%	C51	Distributed Prioritized Experience Replay
125%	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
117%	Dueling DDQN	Dueling Network Architectures for Deep Reinforcement Learning
116%	A3C	Dueling Network Architectures for Deep Reinforcement Learning
110%	DDQN	Dueling Network Architectures for Deep Reinforcement Learning
102%	Noisy DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
78%	Gorila DQN	Distributed Prioritized Experience Replay
68%	DQN³	Rainbow: Combining Improvements in Deep Reinforcement Learning

Individual Environments

Although the metric above is a valuable way of comparing the general effectiveness of an algorithm, different algorithms have different strengths. Thus, we also included the state-of-the-art results for each game.

If you want to see how other methods performed in each Atari 2600 games, you can check the results of all methods by clicking the name of the game in the table below.

No-op Starts

Game	Result	Method	Type	Score from
Alien	40804.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Amidar	8659.2	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Assault	24559.4	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Asterix	428200.3	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Asteroids	155495.1	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Atlantis	3433182.0	ACKTR	PG	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Bank Heist	1716.4	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Battle Zone	98895.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Beam Rider	63305.2	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Bowling	160.7	Human	Human	Dueling Network Architectures for Deep Reinforcement Learning
Boxing	100.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Breakout	800.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Centipede	49065.8	DDQN+PopArt	DQN	Learning values across many orders of magnitude
Chopper Command	721851.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Crazy Climber	320426.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Demon Attack	274176.7	ACKTR	PG	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Double Dunk	23.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Enduro	3454.0	C51	Misc	A Distributional Perspective on Reinforcement Learning
Fishing Derby	57.0	NoisyNet-DuelingDQN	DQN	Noisy Networks for Exploration
Freeway	34.0	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Frostbite	9590.5	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Gopher	120500.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Gravitar	3351.4	Human	Human	Dueling Network Architectures for Deep Reinforcement Learning
H.E.R.O.	105929.4	DQfD	Imitation	Deep Q-Learning from Demonstrations
Ice Hockey	33.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
James Bond	21322.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Kangaroo	16200.0	PERDDQN (rank)	DQN	Dueling Network Architectures for Deep Reinforcement Learning
Krull	22849.0	NoisyNet-A3C	PG	Noisy Networks for Exploration
Kung-Fu Master	97829.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Montezuma’s Revenge	41098.4	YouTube	Imitation	Playing hard exploration games by watching YouTube
Ms. Pacman	15693	Human	Human	Human-level control through deep reinforcement learning
Name This Game	25783.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Pong	21.0	DDQN	DQN	Deep Reinforcement Learning with Double Q-learning
Private Eye	98763.2	YouTube	Imitation	Playing hard exploration games by watching YouTube
Q*Bert	302391.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
River Raid	63864.4	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Road Runner	234352.0	NoisyNet-DuelingDQN	DQN	Noisy Networks for Exploration
Robotank	73.8	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Seaquest	392952.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Space Invaders	54681.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Star Gunner	434342.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Tennis	23.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Time Pilot	87085.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Tutankham	314.3	ACKTR	PG	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Up and Down	436665.8	ACKTR	PG	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Venture	1813.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Video Pinball	949604.0	C51	Misc	A Distributional Perspective on Reinforcement Learning
Wizard of Wor	46204.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Zaxxon	42285.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Berzerk	57196.7	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Defender	411943.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Phoenix	224491.1	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Pit Fall	60258.9	YouTube	Imitation	Playing hard exploration games by watching YouTube
Skiing	-4336.9	Human	Human	Dueling Network Architectures for Deep Reinforcement Learning
Solaris	12380.0	A3C	PG	Noisy Networks for Exploration
Surround	10.0	NoisyNet-DuelingDQN	DQN	Noisy Networks for Exploration
Yars Revenge	148594.8	ApeX DQN	DQN	Distributed Prioritized Experience Replay

Human Starts

Game	Result	Method	Type	Score from
Alien	17731.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Amidar	1540.4	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Assault	24404.6	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Asterix	395599.5	DistributionalDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Asteroids	117303.4	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Atlantis	918714.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Bank Heist	1200.8	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Battle Zone	92275.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Beam Rider	72233.7	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Bowling	146.5	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Boxing	80.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Breakout	766.8	A3C LSTM	PG	Asynchronous Methods for Deep Learning
Centipede	10321.9	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Chopper Command	576601.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Crazy Climber	263953.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Demon Attack	133002.1	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Double Dunk	22.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Enduro	2223.9	DuelingPERDQN	DQN	Dueling Network Architectures for Deep Reinforcement Learning
Fishing Derby	22.6	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Freeway	29.1	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Frostbite	6511.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Gopher	121168.2	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Gravitar	3116.0	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
H.E.R.O.	50496.8	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Ice Hockey	24.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
James Bond	18992.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Kangaroo	12185.0	PERDDQN (rank)	DQN	Dueling Network Architectures for Deep Reinforcement Learning
Krull	11209.5	PERDQN (rank)	DQN	Prioritized Experience Replay
Kung-Fu Master	72068.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Montezuma’s Revenge	4182.0	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Ms. Pacman	15375.0	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Name This Game	23829.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Pong	19.1	DDQN	DQN	Deep Reinforcement Learning with Double Q-learning
Private Eye	64169.1	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Q*Bert	380152.1	ApeX DQN	DQN	Distributed Prioritized Experience Replay
River Raid	49982.8	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Road Runner	127111.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Robotank	68.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Seaquest	377179.8	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Space Invaders	50699.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Star Gunner	432958.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Tennis	23.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Time Pilot	71543.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Tutankham	156.3	A3C FF (4 days)	PG	Asynchronous Methods for Deep Learning
Up and Down	347912.2	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Venture	1039.0	Human	Human	Massively Parallel Methods for Deep Reinforcement Learning
Video Pinball	873988.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Wizard of Wor	46897.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Zaxxon	37672.0	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Berzerk	55598.9	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Defender	399865.3	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Phoenix	188788.5	ApeX DQN	DQN	Distributed Prioritized Experience Replay
Pit Fall	5998.9	Human	Human	Deep Reinforcement Learning with Double Q-learning
Skiing	-3686.6	Human	Human	Deep Reinforcement Learning with Double Q-learning
Solaris	11032.6	Human	Human	Deep Reinforcement Learning with Double Q-learning
Surround	7.0	RainbowDQN	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
Yars Revenge	131701.1	ApeX DQN	DQN	Distributed Prioritized Experience Replay

Installation

Prerequisites

To install the Atari 2600 environment, you need the OpenAI Gym toolkit. Read this page to learn how to install OpenAI Gym.

Installation via pip

If you did a full install of OpenAI Gym, the Atari 2600 should already be installed. Otherwise, you can install the Atari 2600 environment with a single pip command:

pip3 install gym[atari]

Test Installation

You can run a simple random agent to make sure the Atari 2600 environment was correctly installed.

import gym
env = gym.make('Pong-v0')
done = False
while not done:
    _, _, done, _ = env.step(env.action_space.sample())
    env.render()
env.close()

Variants

In OpenAI Gym, each game has a few variants, distinguished by their suffixes. Through these variants, you can configure frame skipping and sticky actions. Frame skipping is a technique of using $k$-th frame. In other words, the agent only makes action every $k$ frames, and the same action is performed for $k$ frames. Sticky actions is a technique of setting some nonzero probability $p$ of action being repeated without agent’s control. This adds stochasticity to the deterministic Atari 2600 environments.

For example, there are six variants for the Pong environment.

Name	Frame Skip $k$	Repeat action probability $p$
`Pong-v0`	2~4¹	0.25
`Pong-v4`	2~4¹	0
`PongDeterministic-v0`	4 ²	0.25
`PongDeterministic-v4` ³	4 ²	0
`PongNoFrameskip-v0`	1	0.25
`PongNoFrameskip-v4`	1	0

_{¹ $k$ is chosen randomly at every step from values ${2, 3, 4}$.
² For Space Invaders, the Deterministic variant $k=3$. This is because when $k=4$, the lasers are invisible because it frame skip coincides with the blinking frequency of lasers.
³ Deterministic-v4 is the configuration used to assess Deep Q-Networks.}

For more details about frame skipping and sticky actions, check Sections 2 and 5 of the ALE whitepaper: Revisiting the Arcade Learning Environment.

Also, there are RAM environments such as Pong-ram-v0, where the observation is the RAM of the Atari machine instead of the 210 x 160 visual input. You can also add suffixes to RAM environments.

endtoend.ai