PPO-PyTorch by nikhilbarhate99

Minimal PPO implementation in PyTorch for OpenAI Gym environments

Created 7 years ago

2,290 stars

Top 19.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Phil Wang

Prolific Research Paper Implementer

Project Summary

This repository offers a minimal PyTorch implementation of Proximal Policy Optimization (PPO) with a clipped objective, designed for beginners in Reinforcement Learning to understand the algorithm. It supports both discrete and continuous action spaces and provides utilities for logging, plotting, and creating GIFs from trained models.

How It Works

The implementation uses a constant, linearly decaying standard deviation for continuous action spaces, simplifying hyperparameter tuning. It employs a Monte Carlo estimate for advantages rather than Generalized Advantage Estimation, and it is a single-threaded implementation for clarity. This approach prioritizes a concise and understandable codebase for educational purposes.

Quick Start & Requirements

Install: pip install -r requirements.txt (implied, specific command not provided)
Prerequisites: Python 3, PyTorch, NumPy, OpenAI Gym. For graphs/GIFs: pandas, matplotlib, Pillow.
Environments: Box2D, Roboschool, PyBullet.
Notes: CPU training is recommended for Box2D/Roboschool to avoid performance degradation due to frequent CPU-GPU data transfers.
Colab: A comprehensive PPO_colab.ipynb notebook is available for training, testing, plotting, and GIF creation.

Highlighted Details

Supports both discrete and continuous action spaces.
Includes linear decaying action standard deviation for continuous environments.
Logs training progress (episodes, timesteps, rewards) to CSV files.
Provides utilities for plotting training graphs and generating GIFs from trained networks.
Offers a convenient Jupyter notebook for end-to-end workflow on Google Colab.

Maintenance & Community

The repository was last updated in April 2021. No specific community channels or active maintenance signals are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. However, the presence of a bibtex entry for citation suggests it is intended for research and academic use. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The implementation uses a simplified Monte Carlo advantage estimation and a single-threaded experience collection, which may limit performance on highly complex environments compared to more advanced PPO variants. Hyperparameter tuning might be necessary for optimal results in challenging scenarios.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days