Super-mario-bros-PPO-pytorch by vietnh1009

PPO agent for Super Mario Bros

Created 6 years ago

1,255 stars

Top 31.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gabriel Almeida

Cofounder of Langflow

Project Summary

This repository provides a PyTorch implementation of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for training an agent to play Super Mario Bros. It aims to achieve higher performance than previous A3C implementations, with the goal of completing a significant majority of the game's levels. The target audience includes researchers and developers interested in reinforcement learning, particularly those exploring policy gradient methods for game environments.

How It Works

The project leverages the PPO algorithm, a policy gradient method known for its stability and sample efficiency, as described in the OpenAI paper "Proximal Policy Optimization Algorithms." This approach balances exploration and exploitation by constraining policy updates, preventing drastic changes that could destabilize training. The implementation is designed to be modular, allowing for training and testing of agents across different Super Mario Bros. levels.

Quick Start & Requirements

Install/Run: python train.py --world <world_num> --stage <stage_num> or python test.py --world <world_num> --stage <stage_num>
Prerequisites: PyTorch, Python. GPU recommended for training.
Docker: A Dockerfile is provided for environment setup.
Docs: [PYTORCH] Proximal Policy Optimization (PPO) for playing Super Mario Bros

Highlighted Details

Trained agent reportedly completes 31/32 Super Mario Bros. levels.
PPO is the algorithm used by OpenAI Five for Dota 2.
Offers flexibility in training by allowing specification of world and stage.
Docker support included for easier environment management.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The agent has not been able to solve level 8-4 due to its puzzle-like nature requiring specific path choices. A known bug exists with rendering when using Docker, requiring env.render() to be commented out, which disables visualization during training/testing.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days