safety-starter-agents  by openai

RL algorithms for safe exploration research paper

Created 5 years ago
443 stars

Top 67.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides implementations of constrained and unconstrained Reinforcement Learning (RL) algorithms, specifically PPO, TRPO, PPO-Lagrangian, TRPO-Lagrangian, and CPO. It serves as a companion to the paper "Benchmarking Safe Exploration in Deep Reinforcement Learning" and is intended for researchers and practitioners in safe RL.

How It Works

The agents are implemented using a PPO variant that differs from common implementations like Baselines, omitting observation/reward normalization and clipped value loss, but including an early stopping trick. This approach prioritizes straightforward comparison between the included algorithms within the context of the paper's experiments, rather than maximizing sample efficiency for any single algorithm.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires Python 3.6+.
  • Tested on Mac OS Mojave and Ubuntu 16.04 LTS.
  • Note: Does not include Safety Gym; it must be installed separately.

Highlighted Details

  • Implements algorithms used in the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.
  • Includes experimental implementations of SAC and SAC-Lagrangian.
  • Provides scripts for reproducing paper experiments, plotting results, and testing trained policies.

Maintenance & Community

  • Status: Archived (code provided as-is, no updates expected).
  • Developed by OpenAI.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Intended for research use; compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The repository is archived, meaning no further updates or support are expected. The PPO implementation is not optimized for maximum sample efficiency compared to other common implementations. Reproducing results may not be perfectly deterministic across different machines.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Anastasis Germanidis Anastasis Germanidis(Cofounder of Runway), and
1 more.

deer by VinF

0%
489
Deep reinforcement learning framework
Created 9 years ago
Updated 3 months ago
Feedback? Help us improve.