PPO-PyTorch  by nikhilbarhate99

Minimal PPO implementation in PyTorch for OpenAI Gym environments

created 6 years ago
2,128 stars

Top 21.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository offers a minimal PyTorch implementation of Proximal Policy Optimization (PPO) with a clipped objective, designed for beginners in Reinforcement Learning to understand the algorithm. It supports both discrete and continuous action spaces and provides utilities for logging, plotting, and creating GIFs from trained models.

How It Works

The implementation uses a constant, linearly decaying standard deviation for continuous action spaces, simplifying hyperparameter tuning. It employs a Monte Carlo estimate for advantages rather than Generalized Advantage Estimation, and it is a single-threaded implementation for clarity. This approach prioritizes a concise and understandable codebase for educational purposes.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (implied, specific command not provided)
  • Prerequisites: Python 3, PyTorch, NumPy, OpenAI Gym. For graphs/GIFs: pandas, matplotlib, Pillow.
  • Environments: Box2D, Roboschool, PyBullet.
  • Notes: CPU training is recommended for Box2D/Roboschool to avoid performance degradation due to frequent CPU-GPU data transfers.
  • Colab: A comprehensive PPO_colab.ipynb notebook is available for training, testing, plotting, and GIF creation.

Highlighted Details

  • Supports both discrete and continuous action spaces.
  • Includes linear decaying action standard deviation for continuous environments.
  • Logs training progress (episodes, timesteps, rewards) to CSV files.
  • Provides utilities for plotting training graphs and generating GIFs from trained networks.
  • Offers a convenient Jupyter notebook for end-to-end workflow on Google Colab.

Maintenance & Community

The repository was last updated in April 2021. No specific community channels or active maintenance signals are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. However, the presence of a bibtex entry for citation suggests it is intended for research and academic use. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The implementation uses a simplified Monte Carlo advantage estimation and a single-threaded experience collection, which may limit performance on highly complex environments compared to more advanced PPO variants. Hyperparameter tuning might be necessary for optimal results in challenging scenarios.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
135 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.