Atari Amidar Environment
Overview
As in Pac-Man, the player is opposed by enemies who kill on contact. The enemies gradually increase in number as the player advances from one level to the next, and their speed also increases. On odd-numbered levels, the player controls an ape (in some versions labeled “Copier”), and must collect coconuts while avoiding headhunters (labeled “Police” and “Thief”). On even-numbered levels, the player controls a paint roller (labeled “Rustler”), and must paint over each spot of the board while avoiding pigs (labeled “Cattle” and “Thief”). Each level is followed by a short bonus stage.
Whenever a rectangular portion of the board is cleared (either by collecting all surrounding coconuts, or painting all surrounding edges), the rectangle is colored in, and in the even levels, bonus points are awarded (In odd-numbered levels, the player collects points for each coconut eaten). When the player clears all four corners of the board, he is briefly empowered to kill the enemies by touching them (just as when Pac-Man uses a “power pill”). Enemies killed in this way fall to the bottom of the screen and revitalise themselves after a few moments.
The game controls consist of a joystick and a single button labeled “Jump,” which can be used up to three times, resetting after a level is cleared or the player loses a life. Pressing the jump button does not cause the player to jump, but causes all the enemies to jump, enabling the player to walk under them.
Extra lives are given at 50,000 points, and per 80,000 scored up to 930,000; after that, no more lives.
Description from Wikipedia
Performances of RL Agents
We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!
Human Starts
No-op Starts
Normal Starts
Result | Algorithm | Source |
---|---|---|
827.6 | ACER | Proximal Policy Optimization Algorithm |
674.6 | PPO | Proximal Policy Optimization Algorithm |
380.8 | A2C | Proximal Policy Optimization Algorithm |