PyTorch implementations of Policy Gradient reinforcement learning algorithms
Top 77.7% on sourcepulse
This repository provides PyTorch implementations of key Policy Gradient (PG) reinforcement learning algorithms, including REINFORCE, NPG, TRPO, and PPO. It targets researchers and practitioners in reinforcement learning, offering a unified framework for experimenting with and comparing these advanced PG methods on standard benchmarks.
How It Works
The project implements four distinct PG algorithms: Vanilla Policy Gradient, Truncated Natural Policy Gradient, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO). It leverages PyTorch for model definition and training. The implementations are designed to be modular, allowing for easy switching between algorithms and hyperparameter tuning. The use of standard RL benchmarks like Mujoco and Unity ml-agents facilitates reproducible research and direct comparison of algorithm performance.
Quick Start & Requirements
pip install -r requirements.txt
(within pg_travel/mujoco
)mujoco-py
(requires a license from DeepMind), Python 3.x.python main.py
(defaults to PPO on Hopper-v2)pg_travel/unity/env
.python main.py --train
(within pg_travel/unity
)Highlighted Details
mujoco-py
and custom Unity ml-agents environments.Maintenance & Community
The repository is maintained by reinforcement-learning-kr
. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.
Limitations & Caveats
The project uses PyTorch v0.4.0, which is an older version and may have compatibility issues with newer PyTorch releases or libraries. Trained agents and Unity ml-agent environment source files are noted as "soon to be available," indicating potential incompleteness.
6 years ago
1 week