Atari Asteroids Environment
Overview
The objective of Asteroids is to destroy asteroids and saucers. The player controls a triangular ship that can rotate left and right, fire shots straight forward, and thrust forward.[3] Once the ship begins moving in a direction, it will continue in that direction for a time without player intervention unless the player applies thrust in a different direction. The ship eventually comes to a stop when not thrusting. The player can also send the ship into hyperspace, causing it to disappear and reappear in a random location on the screen, at the risk of self-destructing or appearing on top of an asteroid.[4]
Each level starts with a few large asteroids drifting in various directions on the screen. Objects wrap around screen edges – for instance, an asteroid that drifts off the top edge of the screen reappears at the bottom and continues moving in the same direction.[5] As the player shoots asteroids, they break into smaller asteroids that move faster and are more difficult to hit. Smaller asteroids are also worth more points. Two flying saucers appear periodically on the screen; the “big saucer” shoots randomly and poorly, while the “small saucer” fires frequently at the ship. After reaching a score of 40,000, only the small saucer appears. As the player’s score increases, the angle range of the shots from the small saucer diminishes until the saucer fires extremely accurately.[6] Once the screen has been cleared of all asteroids and flying saucers, a new set of large asteroids appears, thus starting the next level. The game gets harder as the number of asteroids increases until after the score reaches a range between 40,000 and 60,000.[7] The player starts with 3 lives after a coin is inserted and gains an extra life per 10,000 points.[8] When the player loses all their lives, the game ends.
Description from Wikipedia
Performances of RL Agents
We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!
Human Starts
No-op Starts
Normal Starts
Result | Algorithm | Source |
---|---|---|
2389.3 | ACER | Proximal Policy Optimization Algorithm |
2097.5 | PPO | Proximal Policy Optimization Algorithm |
1653.3 | A2C | Proximal Policy Optimization Algorithm |
1070 | DQN Ours | Deep Recurrent Q-Learning for Partially Observable MDPs |
1032 | DRQN | Deep Recurrent Q-Learning for Partially Observable MDPs |
1020 | DRQN | Deep Recurrent Q-Learning for Partially Observable MDPs |
1010 | DQN Ours | Deep Recurrent Q-Learning for Partially Observable MDPs |