Reinforcement learning implementation with OpenAI Gym
Top 78.9% on sourcepulse
This repository provides an implementation of Trust Region Policy Optimization (TRPO) with Generalized Advantage Estimation (GAE) for reinforcement learning tasks. It targets researchers and practitioners in AI and robotics, offering a robust solution for continuous control problems, demonstrated by its success on MuJoCo benchmarks and its updated compatibility with TensorFlow 2.0 and PyBullet.
How It Works
The core approach utilizes TRPO, a policy gradient method that ensures monotonic policy improvement by constraining policy updates within a trust region. It employs a value function approximated by a 3-hidden-layer neural network and a policy approximated by a similar network, both using tanh activations. Generalized Advantage Estimation (GAE) is used for variance reduction in gradient estimation. The implementation dynamically adjusts KL loss factor and learning rate during training for stability.
Quick Start & Requirements
python train.py <EnvironmentName>
(e.g., python train.py InvertedPendulumBulletEnv-v0
)Highlighted Details
Maintenance & Community
The project is maintained by Patrick Coady. No specific community channels (like Discord/Slack) or roadmap are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Given the lack of explicit mention, users should assume all rights reserved or contact the author for clarification. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is a direct implementation of the TRPO algorithm and may require significant computational resources and time for training complex environments. The README does not detail specific performance benchmarks for the PyBullet version or discuss potential limitations regarding scalability to extremely high-dimensional state/action spaces.
5 years ago
1 week