trpo  by pat-coady

Reinforcement learning implementation with OpenAI Gym

created 8 years ago
360 stars

Top 78.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides an implementation of Trust Region Policy Optimization (TRPO) with Generalized Advantage Estimation (GAE) for reinforcement learning tasks. It targets researchers and practitioners in AI and robotics, offering a robust solution for continuous control problems, demonstrated by its success on MuJoCo benchmarks and its updated compatibility with TensorFlow 2.0 and PyBullet.

How It Works

The core approach utilizes TRPO, a policy gradient method that ensures monotonic policy improvement by constraining policy updates within a trust region. It employs a value function approximated by a 3-hidden-layer neural network and a policy approximated by a similar network, both using tanh activations. Generalized Advantage Estimation (GAE) is used for variance reduction in gradient estimation. The implementation dynamically adjusts KL loss factor and learning rate during training for stability.

Quick Start & Requirements

  • Primary install/run command: python train.py <EnvironmentName> (e.g., python train.py InvertedPendulumBulletEnv-v0)
  • Prerequisites: Python 3.6, TensorFlow 2.x, NumPy, Matplotlib, SciPy, OpenAI Gym, PyBullet.
  • Links: GitHub Repository

Highlighted Details

  • Successfully achieved top spots on AI Gym MuJoCo leaderboards without hyperparameter tuning.
  • Updated to TensorFlow 2.0 and PyBullet (replacing MuJoCo).
  • Uses a 3-hidden-layer NN for policy and value function approximation.
  • Implements Generalized Advantage Estimation (GAE) with specific gamma and lambda values.
  • Dynamically adjusts KL loss factor and learning rate during training.

Maintenance & Community

The project is maintained by Patrick Coady. No specific community channels (like Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the lack of explicit mention, users should assume all rights reserved or contact the author for clarification. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a direct implementation of the TRPO algorithm and may require significant computational resources and time for training complex environments. The README does not detail specific performance benchmarks for the PyBullet version or discuss potential limitations regarding scalability to extremely high-dimensional state/action spaces.

Health Check
Last commit

5 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.