phasic-policy-gradient  by openai

RL research paper implementation

created 4 years ago
262 stars

Top 97.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the implementation for Phasic Policy Gradient (PPG), a reinforcement learning algorithm designed to improve upon Proximal Policy Optimization (PPO) by decoupling policy and value function optimization. It is intended for researchers and practitioners in reinforcement learning.

How It Works

PPG introduces a phased approach where the policy is updated for multiple epochs before the value function is updated. This is achieved by adding an auxiliary loss that encourages the policy to stay close to a previous version of itself, thereby stabilizing training. The implementation uses MPI for distributed training.

Quick Start & Requirements

  • Install dependencies via conda env update --name phasic-policy-gradient --file phasic-policy-gradient/environment.yml
  • Activate environment: conda activate phasic-policy-gradient
  • Install package: pip install -e phasic-policy-gradient
  • Supported platforms: macOS 10.14 (Mojave), Ubuntu 16.04
  • Supported Python: 3.7 (64-bit)
  • Requires MPI for distributed training.

Highlighted Details

  • Codebase for the "Phasic Policy Gradient" paper.
  • Includes scripts to reproduce and visualize results for PPG and PPO baselines.
  • Supports variations of PPG by adjusting hyperparameters like n_epoch_pi, n_aux_epochs, and using KL divergence instead of clipping.

Maintenance & Community

  • Status: Archived (code provided as-is, no updates expected).
  • Developed by OpenAI.

Licensing & Compatibility

  • License: Not explicitly stated in the README. The repository structure suggests it might follow OpenAI's typical MIT license, but this should be verified.
  • Compatibility: Designed for specific older OS and Python versions.

Limitations & Caveats

The project is archived and will not receive further updates. It is restricted to older operating systems (macOS 10.14, Ubuntu 16.04) and Python 3.7, potentially limiting its usability with modern hardware and software stacks.

Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.