pytorch-a2c-ppo-acktr-gail  by ikostrikov

PyTorch implementations of reinforcement learning algorithms

created 8 years ago
3,808 stars

Top 13.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch implementations of popular deep reinforcement learning algorithms: Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method using Kronecker-factored approximation (ACKTR), and Generative Adversarial Imitation Learning (GAIL). It is targeted at researchers and practitioners in reinforcement learning who need well-tuned, reference implementations for Atari, MuJoCo, PyBullet, and DeepMind Control Suite environments. The primary benefit is access to validated, high-performance algorithms with hyperparameter settings derived from OpenAI's successful benchmarks.

How It Works

The implementation is directly inspired by OpenAI baselines, utilizing the same well-tuned hyperparameters and model architectures for Atari games. It supports synchronous A2C, PPO with GAE, and ACKTR, a method that leverages Kronecker-factored approximations for efficient trust-region updates in deep RL. GAIL is also included for imitation learning tasks. The code is designed for compatibility across various Gym-like environments, including Atari, MuJoCo, PyBullet, and DeepMind Control Suite, facilitating direct comparison and experimentation.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (after installing PyTorch and Gym Atari via conda).
  • Prerequisites: Python 3, PyTorch, Stable Baselines3, OpenAI Gym (with Atari support).
  • Environments: Atari Learning Environment, MuJoCo, PyBullet, DeepMind Control Suite.
  • Docs: OpenAI posts for A2C/ACKTR and PPO.

Highlighted Details

  • Implements A2C, PPO, ACKTR, and GAIL.
  • Tested on Atari, MuJoCo, PyBullet, and DeepMind Control Suite environments.
  • Uses OpenAI's well-tuned hyperparameters and model architectures.
  • Includes scripts for training and visualization (visualize.ipynb).

Maintenance & Community

The repository was last updated on April 12th, 2021. The author notes that Soft Actor Critic (SAC) might be superior for continuous control and directs users to a new JAX repository. Contributions are welcome via issues and pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the code is a PyTorch implementation inspired by OpenAI baselines, which are typically released under permissive licenses. Users should verify the license for commercial use.

Limitations & Caveats

The author notes that reproducing RL results is difficult and minor differences between TensorFlow and PyTorch can cause performance variations. ACKTR integration for MuJoCo requires specific modifications not yet fully implemented for code unification. The project's last update was in April 2021, and the author recommends a newer JAX repository for continuous control tasks.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
80 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.