Observational Overfitting in Reinforcement Learning

ICLR2020 Reinforcement Learning

Observational overfitting: Agent overfits due to properties of the observation irrelevant to the latent dynamics of the MDP.
Effect: This could hinder generalization.
Evidence 1: Scoreboard and background objects is highlighted red in the saliency map.
Evidence 2: Covering the scoreboard with a black rectangle during training resulted in a 10% increased test performance.
Solution?: Overparametrizing can help as a form of “implicit regularization.”, improving generalization to test set.