ppo-implementation-details by vwxyzjn

Code for a Proximal Policy Optimization (PPO) blog post

Created 4 years ago

907 stars

Top 40.0% on SourcePulse

Project Summary

This repository provides the source code for a blog post detailing 37 implementation nuances of Proximal Policy Optimization (PPO). It's targeted at reinforcement learning researchers and engineers seeking to understand and replicate PPO's practical performance improvements. The benefit is a clear, code-backed explanation of critical PPO tuning parameters.

How It Works

The implementation leverages the CleanRL library, a lightweight, single-file RL library designed for clarity and reproducibility. It demonstrates PPO across various environments including Atari, PyBullet, Gym-Microrts, and Procgen, showcasing specific configurations and optimizations discussed in the blog post. The use of CleanRL facilitates easy experimentation and direct comparison of implementation details.

Quick Start & Requirements

Install dependencies: poetry install
Train agents: poetry run python ppo.py
For specific environments (Atari, PyBullet, Gym-Microrts, Procgen, Envpool), install with extra features: poetry install -E <env_name>
Prerequisites: Python 3.8+
Experiment tracking: Add --track flag. Video capture: Add --capture-video flag.
Blog post: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Experiments: https://wandb.ai/vwxyzjn/ppo-details

Highlighted Details

Reproduces results from the blog post, with scripts for both OpenAI Baselines and custom implementations.
Demonstrates significant speedups (3-4x) using Envpool for Atari environments.
Achieves high performance, e.g., solving Pong-v5 in 5 minutes and achieving 400 game scores in Breakout-v5 within an hour.
Includes implementations for invalid action masking in Gym-Microrts.

Maintenance & Community

The repository is associated with the author of CleanRL (vwxyzjn), a popular RL library. Further details and community interaction can likely be found via the CleanRL GitHub repository.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, it is built upon CleanRL, which is MIT licensed. This suggests a permissive license, suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The repository is primarily a code companion to a blog post, not a standalone library. While it demonstrates PPO's implementation details, it may require adaptation for direct use in production systems. Reproduction of all results requires installing a specific fork of openai/baselines.

ppo-implementation-details by vwxyzjn

Explore Similar Projects

machin by iffiX

pytorch-rl by navneet-nmk

DRLib by kaixindelele

pytorch-cpp-rl by Omegastick

simple_rl by david-abel

pytorch-rl by jingweiz

PPO-for-Beginners by ericyangyu

pytorch-ddpg by ghliu

stable-baselines3-contrib by Stable-Baselines-Team

PPO-PyTorch by nikhilbarhate99

rl-baselines3-zoo by DLR-RM

stable-baselines3 by DLR-RM