PyTorch implementations of reinforcement learning algorithms
Top 13.1% on sourcepulse
This repository provides PyTorch implementations of popular deep reinforcement learning algorithms: Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method using Kronecker-factored approximation (ACKTR), and Generative Adversarial Imitation Learning (GAIL). It is targeted at researchers and practitioners in reinforcement learning who need well-tuned, reference implementations for Atari, MuJoCo, PyBullet, and DeepMind Control Suite environments. The primary benefit is access to validated, high-performance algorithms with hyperparameter settings derived from OpenAI's successful benchmarks.
How It Works
The implementation is directly inspired by OpenAI baselines, utilizing the same well-tuned hyperparameters and model architectures for Atari games. It supports synchronous A2C, PPO with GAE, and ACKTR, a method that leverages Kronecker-factored approximations for efficient trust-region updates in deep RL. GAIL is also included for imitation learning tasks. The code is designed for compatibility across various Gym-like environments, including Atari, MuJoCo, PyBullet, and DeepMind Control Suite, facilitating direct comparison and experimentation.
Quick Start & Requirements
pip install -r requirements.txt
(after installing PyTorch and Gym Atari via conda).Highlighted Details
visualize.ipynb
).Maintenance & Community
The repository was last updated on April 12th, 2021. The author notes that Soft Actor Critic (SAC) might be superior for continuous control and directs users to a new JAX repository. Contributions are welcome via issues and pull requests.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, the code is a PyTorch implementation inspired by OpenAI baselines, which are typically released under permissive licenses. Users should verify the license for commercial use.
Limitations & Caveats
The author notes that reproducing RL results is difficult and minor differences between TensorFlow and PyTorch can cause performance variations. ACKTR integration for MuJoCo requires specific modifications not yet fully implemented for code unification. The project's last update was in April 2021, and the author recommends a newer JAX repository for continuous control tasks.
3 years ago
Inactive