RL research paper implementation
Top 97.8% on sourcepulse
This repository provides the implementation for Phasic Policy Gradient (PPG), a reinforcement learning algorithm designed to improve upon Proximal Policy Optimization (PPO) by decoupling policy and value function optimization. It is intended for researchers and practitioners in reinforcement learning.
How It Works
PPG introduces a phased approach where the policy is updated for multiple epochs before the value function is updated. This is achieved by adding an auxiliary loss that encourages the policy to stay close to a previous version of itself, thereby stabilizing training. The implementation uses MPI for distributed training.
Quick Start & Requirements
conda env update --name phasic-policy-gradient --file phasic-policy-gradient/environment.yml
conda activate phasic-policy-gradient
pip install -e phasic-policy-gradient
Highlighted Details
n_epoch_pi
, n_aux_epochs
, and using KL divergence instead of clipping.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is archived and will not receive further updates. It is restricted to older operating systems (macOS 10.14, Ubuntu 16.04) and Python 3.7, potentially limiting its usability with modern hardware and software stacks.
2 years ago
1+ week