phasic-policy-gradient by openai

RL research paper implementation

Created 5 years ago

267 stars

Top 96.1% on SourcePulse

Project Summary

This repository provides the implementation for Phasic Policy Gradient (PPG), a reinforcement learning algorithm designed to improve upon Proximal Policy Optimization (PPO) by decoupling policy and value function optimization. It is intended for researchers and practitioners in reinforcement learning.

How It Works

PPG introduces a phased approach where the policy is updated for multiple epochs before the value function is updated. This is achieved by adding an auxiliary loss that encourages the policy to stay close to a previous version of itself, thereby stabilizing training. The implementation uses MPI for distributed training.

Quick Start & Requirements

Install dependencies via conda env update --name phasic-policy-gradient --file phasic-policy-gradient/environment.yml
Activate environment: conda activate phasic-policy-gradient
Install package: pip install -e phasic-policy-gradient
Supported platforms: macOS 10.14 (Mojave), Ubuntu 16.04
Supported Python: 3.7 (64-bit)
Requires MPI for distributed training.

Highlighted Details

Codebase for the "Phasic Policy Gradient" paper.
Includes scripts to reproduce and visualize results for PPG and PPO baselines.
Supports variations of PPG by adjusting hyperparameters like n_epoch_pi, n_aux_epochs, and using KL divergence instead of clipping.

Maintenance & Community

Status: Archived (code provided as-is, no updates expected).
Developed by OpenAI.

Licensing & Compatibility

License: Not explicitly stated in the README. The repository structure suggests it might follow OpenAI's typical MIT license, but this should be verified.
Compatibility: Designed for specific older OS and Python versions.

Limitations & Caveats

The project is archived and will not receive further updates. It is restricted to older operating systems (macOS 10.14, Ubuntu 16.04) and Python 3.7, potentially limiting its usability with modern hardware and software stacks.

phasic-policy-gradient by openai

Explore Similar Projects

LLM-with-RL-papers by floodsung

MARL-papers-with-code by TimeBreaker

LlamaGym by KhoomeiK

coinrun by openai

pytorch-cpp-rl by Omegastick

PPO-for-Beginners by ericyangyu

epymarl by uoe-agents

David-Silver-Reinforcement-learning by dalmia

stable-baselines3-contrib by Stable-Baselines-Team

all-rl-algorithms by FareedKhan-dev

coach by IntelLabs

baselines by openai