PPO agent for Super Mario Bros
Top 32.9% on sourcepulse
This repository provides a PyTorch implementation of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for training an agent to play Super Mario Bros. It aims to achieve higher performance than previous A3C implementations, with the goal of completing a significant majority of the game's levels. The target audience includes researchers and developers interested in reinforcement learning, particularly those exploring policy gradient methods for game environments.
How It Works
The project leverages the PPO algorithm, a policy gradient method known for its stability and sample efficiency, as described in the OpenAI paper "Proximal Policy Optimization Algorithms." This approach balances exploration and exploitation by constraining policy updates, preventing drastic changes that could destabilize training. The implementation is designed to be modular, allowing for training and testing of agents across different Super Mario Bros. levels.
Quick Start & Requirements
python train.py --world <world_num> --stage <stage_num>
or python test.py --world <world_num> --stage <stage_num>
Dockerfile
is provided for environment setup.Highlighted Details
Maintenance & Community
No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The agent has not been able to solve level 8-4 due to its puzzle-like nature requiring specific path choices. A known bug exists with rendering when using Docker, requiring env.render()
to be commented out, which disables visualization during training/testing.
4 years ago
Inactive