Minimal PPO implementation in PyTorch for OpenAI Gym environments
Top 21.6% on sourcepulse
This repository offers a minimal PyTorch implementation of Proximal Policy Optimization (PPO) with a clipped objective, designed for beginners in Reinforcement Learning to understand the algorithm. It supports both discrete and continuous action spaces and provides utilities for logging, plotting, and creating GIFs from trained models.
How It Works
The implementation uses a constant, linearly decaying standard deviation for continuous action spaces, simplifying hyperparameter tuning. It employs a Monte Carlo estimate for advantages rather than Generalized Advantage Estimation, and it is a single-threaded implementation for clarity. This approach prioritizes a concise and understandable codebase for educational purposes.
Quick Start & Requirements
pip install -r requirements.txt
(implied, specific command not provided)PPO_colab.ipynb
notebook is available for training, testing, plotting, and GIF creation.Highlighted Details
Maintenance & Community
The repository was last updated in April 2021. No specific community channels or active maintenance signals are mentioned.
Licensing & Compatibility
The repository does not explicitly state a license. However, the presence of a bibtex entry for citation suggests it is intended for research and academic use. Compatibility with commercial or closed-source projects is not specified.
Limitations & Caveats
The implementation uses a simplified Monte Carlo advantage estimation and a single-threaded experience collection, which may limit performance on highly complex environments compared to more advanced PPO variants. Hyperparameter tuning might be necessary for optimal results in challenging scenarios.
1 year ago
1 week