pytorch-a2c-ppo-acktr-gail by ikostrikov

PyTorch implementations of reinforcement learning algorithms

Created 8 years ago

3,871 stars

Top 12.4% on SourcePulse

View on GitHub

12 Experts Love This Project

Théophile Gervet

Cofounder of Genesis AI

Joshua Achiam

Head of Mission Alignment at OpenAI

Ross Wightman

Author of timm; CV at Hugging Face

Jerry Tworek

VP Research at OpenAI

and 8 more!

Project Summary

This repository provides PyTorch implementations of popular deep reinforcement learning algorithms: Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method using Kronecker-factored approximation (ACKTR), and Generative Adversarial Imitation Learning (GAIL). It is targeted at researchers and practitioners in reinforcement learning who need well-tuned, reference implementations for Atari, MuJoCo, PyBullet, and DeepMind Control Suite environments. The primary benefit is access to validated, high-performance algorithms with hyperparameter settings derived from OpenAI's successful benchmarks.

How It Works

The implementation is directly inspired by OpenAI baselines, utilizing the same well-tuned hyperparameters and model architectures for Atari games. It supports synchronous A2C, PPO with GAE, and ACKTR, a method that leverages Kronecker-factored approximations for efficient trust-region updates in deep RL. GAIL is also included for imitation learning tasks. The code is designed for compatibility across various Gym-like environments, including Atari, MuJoCo, PyBullet, and DeepMind Control Suite, facilitating direct comparison and experimentation.

Quick Start & Requirements

Install: pip install -r requirements.txt (after installing PyTorch and Gym Atari via conda).
Prerequisites: Python 3, PyTorch, Stable Baselines3, OpenAI Gym (with Atari support).
Environments: Atari Learning Environment, MuJoCo, PyBullet, DeepMind Control Suite.
Docs: OpenAI posts for A2C/ACKTR and PPO.

Highlighted Details

Implements A2C, PPO, ACKTR, and GAIL.
Tested on Atari, MuJoCo, PyBullet, and DeepMind Control Suite environments.
Uses OpenAI's well-tuned hyperparameters and model architectures.
Includes scripts for training and visualization (visualize.ipynb).

Maintenance & Community

The repository was last updated on April 12th, 2021. The author notes that Soft Actor Critic (SAC) might be superior for continuous control and directs users to a new JAX repository. Contributions are welcome via issues and pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the code is a PyTorch implementation inspired by OpenAI baselines, which are typically released under permissive licenses. Users should verify the license for commercial use.

Limitations & Caveats

The author notes that reproducing RL results is difficult and minor differences between TensorFlow and PyTorch can cause performance variations. ACKTR integration for MuJoCo requires specific modifications not yet fully implemented for code unification. The project's last update was in April 2021, and the author recommends a newer JAX repository for continuous control tasks.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days