trpo by pat-coady

Reinforcement learning implementation with OpenAI Gym

Created 8 years ago

360 stars

Top 77.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Evan Hubinger

Head of Alignment Stress-Testing at Anthropic

Deshraj Yadav

Cofounder of Mem0

Junxiao Song

Research Scientist at DeepSeek

Project Summary

This repository provides an implementation of Trust Region Policy Optimization (TRPO) with Generalized Advantage Estimation (GAE) for reinforcement learning tasks. It targets researchers and practitioners in AI and robotics, offering a robust solution for continuous control problems, demonstrated by its success on MuJoCo benchmarks and its updated compatibility with TensorFlow 2.0 and PyBullet.

How It Works

The core approach utilizes TRPO, a policy gradient method that ensures monotonic policy improvement by constraining policy updates within a trust region. It employs a value function approximated by a 3-hidden-layer neural network and a policy approximated by a similar network, both using tanh activations. Generalized Advantage Estimation (GAE) is used for variance reduction in gradient estimation. The implementation dynamically adjusts KL loss factor and learning rate during training for stability.

Quick Start & Requirements

Primary install/run command: python train.py <EnvironmentName> (e.g., python train.py InvertedPendulumBulletEnv-v0)
Prerequisites: Python 3.6, TensorFlow 2.x, NumPy, Matplotlib, SciPy, OpenAI Gym, PyBullet.
Links: GitHub Repository

Highlighted Details

Successfully achieved top spots on AI Gym MuJoCo leaderboards without hyperparameter tuning.
Updated to TensorFlow 2.0 and PyBullet (replacing MuJoCo).
Uses a 3-hidden-layer NN for policy and value function approximation.
Implements Generalized Advantage Estimation (GAE) with specific gamma and lambda values.
Dynamically adjusts KL loss factor and learning rate during training.

Maintenance & Community

The project is maintained by Patrick Coady. No specific community channels (like Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the lack of explicit mention, users should assume all rights reserved or contact the author for clarification. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a direct implementation of the TRPO algorithm and may require significant computational resources and time for training complex environments. The README does not detail specific performance benchmarks for the PyBullet version or discuss potential limitations regarding scalability to extremely high-dimensional state/action spaces.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days