safety-starter-agents  by openai

RL algorithms for safe exploration research paper

created 5 years ago
435 stars

Top 69.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides implementations of constrained and unconstrained Reinforcement Learning (RL) algorithms, specifically PPO, TRPO, PPO-Lagrangian, TRPO-Lagrangian, and CPO. It serves as a companion to the paper "Benchmarking Safe Exploration in Deep Reinforcement Learning" and is intended for researchers and practitioners in safe RL.

How It Works

The agents are implemented using a PPO variant that differs from common implementations like Baselines, omitting observation/reward normalization and clipped value loss, but including an early stopping trick. This approach prioritizes straightforward comparison between the included algorithms within the context of the paper's experiments, rather than maximizing sample efficiency for any single algorithm.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires Python 3.6+.
  • Tested on Mac OS Mojave and Ubuntu 16.04 LTS.
  • Note: Does not include Safety Gym; it must be installed separately.

Highlighted Details

  • Implements algorithms used in the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.
  • Includes experimental implementations of SAC and SAC-Lagrangian.
  • Provides scripts for reproducing paper experiments, plotting results, and testing trained policies.

Maintenance & Community

  • Status: Archived (code provided as-is, no updates expected).
  • Developed by OpenAI.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Intended for research use; compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The repository is archived, meaning no further updates or support are expected. The PPO implementation is not optimized for maximum sample efficiency compared to other common implementations. Reproducing results may not be perfectly deterministic across different machines.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.