Super-mario-bros-PPO-pytorch  by vietnh1009

PPO agent for Super Mario Bros

Created 6 years ago
1,244 stars

Top 31.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for training an agent to play Super Mario Bros. It aims to achieve higher performance than previous A3C implementations, with the goal of completing a significant majority of the game's levels. The target audience includes researchers and developers interested in reinforcement learning, particularly those exploring policy gradient methods for game environments.

How It Works

The project leverages the PPO algorithm, a policy gradient method known for its stability and sample efficiency, as described in the OpenAI paper "Proximal Policy Optimization Algorithms." This approach balances exploration and exploitation by constraining policy updates, preventing drastic changes that could destabilize training. The implementation is designed to be modular, allowing for training and testing of agents across different Super Mario Bros. levels.

Quick Start & Requirements

  • Install/Run: python train.py --world <world_num> --stage <stage_num> or python test.py --world <world_num> --stage <stage_num>
  • Prerequisites: PyTorch, Python. GPU recommended for training.
  • Docker: A Dockerfile is provided for environment setup.
  • Docs: [PYTORCH] Proximal Policy Optimization (PPO) for playing Super Mario Bros

Highlighted Details

  • Trained agent reportedly completes 31/32 Super Mario Bros. levels.
  • PPO is the algorithm used by OpenAI Five for Dota 2.
  • Offers flexibility in training by allowing specification of world and stage.
  • Docker support included for easier environment management.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The agent has not been able to solve level 8-4 due to its puzzle-like nature requiring specific path choices. A known bug exists with rendering when using Docker, requiring env.render() to be commented out, which disables visualization during training/testing.

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), and
2 more.

coach by IntelLabs

0.0%
2k
Reinforcement learning framework for experimentation (discontinued)
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.