PPO-for-Beginners  by ericyangyu

PyTorch PPO implementation for beginners

created 4 years ago
1,048 stars

Top 36.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a simplified, well-documented PyTorch implementation of Proximal Policy Optimization (PPO), designed for beginners in Reinforcement Learning. It aims to demystify PPO by offering a bare-bones, easy-to-follow codebase, directly correlating with a Medium article series for theoretical grounding.

How It Works

The implementation follows the pseudocode from OpenAI's Spinning Up, focusing on clarity and structure. It utilizes a feed-forward neural network for actor and critic policies and is designed for continuous observation and action spaces, though adaptable for discrete spaces. The core logic resides in ppo.py, with main.py orchestrating environment initialization, model training, and testing.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Run training: python main.py
  • Run testing: python main.py --mode test --actor_model ppo_actor.pth
  • Prerequisites: Python, PyTorch. Environments require Box for observation and action spaces.
  • Setup: Create a virtual environment (python -m venv venv, source venv/bin/activate).
  • Additional Resources: Medium Article Series, Spinning Up PPO.

Highlighted Details

  • Directly correlates code with a step-by-step Medium tutorial series.
  • Includes code for data collection and graph generation, with pre-existing data available.
  • Offers detailed comments and structure for pedagogical purposes.
  • Follows pseudocode from OpenAI's Spinning Up for PPO.

Maintenance & Community

The project is authored by Eric Yu. Contact information (Email, LinkedIn) is provided for questions.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The implementation is primarily designed for continuous observation and action spaces. Generating all data for the Medium article's graphs takes approximately 10 hours on a standard computer.

Health Check
Last commit

10 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
88 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.