ppo-implementation-details  by vwxyzjn

Code for a Proximal Policy Optimization (PPO) blog post

Created 3 years ago
842 stars

Top 42.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the source code for a blog post detailing 37 implementation nuances of Proximal Policy Optimization (PPO). It's targeted at reinforcement learning researchers and engineers seeking to understand and replicate PPO's practical performance improvements. The benefit is a clear, code-backed explanation of critical PPO tuning parameters.

How It Works

The implementation leverages the CleanRL library, a lightweight, single-file RL library designed for clarity and reproducibility. It demonstrates PPO across various environments including Atari, PyBullet, Gym-Microrts, and Procgen, showcasing specific configurations and optimizations discussed in the blog post. The use of CleanRL facilitates easy experimentation and direct comparison of implementation details.

Quick Start & Requirements

Highlighted Details

  • Reproduces results from the blog post, with scripts for both OpenAI Baselines and custom implementations.
  • Demonstrates significant speedups (3-4x) using Envpool for Atari environments.
  • Achieves high performance, e.g., solving Pong-v5 in 5 minutes and achieving 400 game scores in Breakout-v5 within an hour.
  • Includes implementations for invalid action masking in Gym-Microrts.

Maintenance & Community

The repository is associated with the author of CleanRL (vwxyzjn), a popular RL library. Further details and community interaction can likely be found via the CleanRL GitHub repository.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, it is built upon CleanRL, which is MIT licensed. This suggests a permissive license, suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The repository is primarily a code companion to a blog post, not a standalone library. While it demonstrates PPO's implementation details, it may require adaptation for direct use in production systems. Reproduction of all results requires installing a specific fork of openai/baselines.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Jerry Tworek Jerry Tworek(VP Research at OpenAI), Jianwei Yang Jianwei Yang(Research Scientist at Meta Superintelligence Lab), and
1 more.

pytorch-rl by jingweiz

0%
801
Deep RL research with PyTorch and Visdom
Created 8 years ago
Updated 5 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
5 more.

stable-baselines3 by DLR-RM

0.4%
12k
PyTorch library for reinforcement learning algorithm implementations
Created 5 years ago
Updated 5 days ago
Feedback? Help us improve.