PyTorch PPO implementation for beginners
Top 36.5% on sourcepulse
This repository provides a simplified, well-documented PyTorch implementation of Proximal Policy Optimization (PPO), designed for beginners in Reinforcement Learning. It aims to demystify PPO by offering a bare-bones, easy-to-follow codebase, directly correlating with a Medium article series for theoretical grounding.
How It Works
The implementation follows the pseudocode from OpenAI's Spinning Up, focusing on clarity and structure. It utilizes a feed-forward neural network for actor and critic policies and is designed for continuous observation and action spaces, though adaptable for discrete spaces. The core logic resides in ppo.py
, with main.py
orchestrating environment initialization, model training, and testing.
Quick Start & Requirements
pip install -r requirements.txt
python main.py
python main.py --mode test --actor_model ppo_actor.pth
Box
for observation and action spaces.python -m venv venv
, source venv/bin/activate
).Highlighted Details
Maintenance & Community
The project is authored by Eric Yu. Contact information (Email, LinkedIn) is provided for questions.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README.
Limitations & Caveats
The implementation is primarily designed for continuous observation and action spaces. Generating all data for the Medium article's graphs takes approximately 10 hours on a standard computer.
10 months ago
1+ week