RL algorithms for safe exploration research paper
Top 69.5% on sourcepulse
This repository provides implementations of constrained and unconstrained Reinforcement Learning (RL) algorithms, specifically PPO, TRPO, PPO-Lagrangian, TRPO-Lagrangian, and CPO. It serves as a companion to the paper "Benchmarking Safe Exploration in Deep Reinforcement Learning" and is intended for researchers and practitioners in safe RL.
How It Works
The agents are implemented using a PPO variant that differs from common implementations like Baselines, omitting observation/reward normalization and clipped value loss, but including an early stopping trick. This approach prioritizes straightforward comparison between the included algorithms within the context of the paper's experiments, rather than maximizing sample efficiency for any single algorithm.
Quick Start & Requirements
pip install -e .
after cloning the repository.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository is archived, meaning no further updates or support are expected. The PPO implementation is not optimized for maximum sample efficiency compared to other common implementations. Reproducing results may not be perfectly deterministic across different machines.
2 years ago
1 week