Atari Kung-Fu Master Environment

Overview

The player controls Thomas with a four-way joystick and two attack buttons for punching and kick. Unlike more conventional side-scrolling games, the joystick is used not only to crouch, but also to jump. Punches and kicks can be performed from a standing, crouching or jumping position. Punches award more points than kicks and do more damage, but their range is shorter.

Underlings encountered by the player include Grippers, who can grab Thomas and drain his energy until shaken off; Knife Throwers, who can throw at two different heights and must be hit twice; and Tom Toms, short fighters who can either grab Thomas or somersault to strike his head when he is crouching. On even-numbered floors, the player must also deal with falling balls and pots, snakes, poisonous moths, fire-breathing dragons, and exploding confetti balls.

The temple has five floors, each ending with a different boss who are “sons of the devil” which are the Stick Fighter of the first floor, the Boomerang Fighter of the second floor, the Strongman of the third floor, the Black Magician of the fourth floor and Mr. X of the final floor who must all be defeated before Thomas can climb the stairs to the next floor so he can rescue Silvia. Thomas must complete each floor within a fixed time; if time runs out or his energy is completely drained, he loses one life and must replay the entire floor. If a boss defeats Thomas, the boss laughs. Although there are five bosses, the game only uses two different synthesized laughs. (The NES version uses a third, high-pitched synthesized laugh for the Black Magician, the fourth boss.)

Once the player has completed all five floors, the game restarts with a more demanding version of the Devil’s Temple, although the essential details remain unchanged. A visual indication of the current house is displayed on the screen. For each series of five completed floors, a dragon symbol appears in the upper-right corner of the screen. After three dragons have been added, the dragon symbols blink.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
40835.0 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
37484.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
33890.0 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
31676.0 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
31244.0 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
30207.0 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
28999.8 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
28819.0 A3C FF Asynchronous Methods for Deep Reinforcement Learning
24288.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
22771.0 DDQN Deep Reinforcement Learning with Double Q-learning
20786.8 Human Massively Parallel Methods for Deep Reinforcement Learning
20620.0 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
20181.0 Prioritized DQN (rank) Prioritized Experience Replay
11875.0 DQN Massively Parallel Methods for Deep Reinforcement Learning
3046.0 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
304.0 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
76642 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
73512 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
71514 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
65836.5 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
61621.5 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
59799.5 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
55790 NoisyNet A3C Noisy Networks for Exploration
52181.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
48375.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
48192 C51 A Distributional Perspective on Reinforcement Learning
43375.5 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
43009.0 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
42259.0 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
41905.0 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
41672 NoisyNet DuDQN Noisy Networks for Exploration
37422 A3C Noisy Networks for Exploration
36310 NoisyNet DQN Noisy Networks for Exploration
34954.0 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
34294.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
30444 DQN Noisy Networks for Exploration
30316 DuDQN Noisy Networks for Exploration
29710.0 DDQN A Distributional Perspective on Reinforcement Learning
29486.0 DDQN Deep Reinforcement Learning with Double Q-learning
29151 Contingency Human-level control through deep reinforcement learning
27543.33 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
26059.0 DQN A Distributional Perspective on Reinforcement Learning
23270 DQN Human-level control through deep reinforcement learning
22736.3 Human Dueling Network Architectures for Deep Reinforcement Learning
22736.2 Human Human-level control through deep reinforcement learning
19544 Linear Human-level control through deep reinforcement learning
258.5 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
27599.3 ACER Proximal Policy Optimization Algorithm
24900.3 A2C Proximal Policy Optimization Algorithm
23310.3 PPO Proximal Policy Optimization Algorithm