rlpyt by astooke

PyTorch library for deep reinforcement learning research

Created 6 years ago

2,270 stars

Top 19.8% on SourcePulse

View on GitHub

12 Experts Love This Project

Cofounder of Cloudera

Justin Spahr-Summers

Cocreator of Model Context Protocol; MTS at Anthropic

and 8 more!

Project Summary

This repository provides modular, optimized implementations of common deep reinforcement learning algorithms in PyTorch, targeting researchers and practitioners for small- to medium-scale experiments. It offers a unified infrastructure for policy gradient, deep Q-learning, and Q-function policy gradient methods, enabling high-throughput research with flexible parallelization and multi-GPU support.

How It Works

The library is built around a modular design, separating concerns into components like Runner, Sampler, Collector, Agent, Model, and Algorithm. It utilizes a custom namedarraytuple data structure for efficient organization and manipulation of NumPy arrays and PyTorch tensors, simplifying handling of multi-modal observations and actions. This approach allows for seamless integration of different components and facilitates easy modification and extension of algorithms.

Quick Start & Requirements

Installation: Clone the repository and create a conda environment using provided YAML files (linux_[cpu|cuda9|cuda10].yml). Then, either add the rlpyt directory to PYTHONPATH or install it as an editable package (pip install -e .).
Prerequisites: PyTorch, OpenAI Gym compatibility, and environment-specific packages (e.g., Atari).
Documentation: Extended documentation is available at https://rlpyt.readthedocs.io.

Highlighted Details

Supports policy gradient (A2C, PPO), deep Q-learning (DQN variants including R2D2-style recurrent), and Q-function policy gradient (DDPG, TD3, SAC).
Features flexible sampling and optimization parallelism, including asynchronous updates and multi-GPU training via PyTorch's DistributedDataParallel.
Implements various replay buffer types (uniform, prioritized, sequence, frame-based) and supports recurrent agents.
Utilizes a namedarraytuple for efficient data handling, supporting multi-modal observations/actions.

Maintenance & Community

The project acknowledges support from Pieter Abbeel, the Fannie & John Hertz Foundation, NVIDIA, Max Jaderberg, OpenAI, and the BAIR community. Contributions are welcomed.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility with commercial or closed-source linking is not specified.

Limitations & Caveats

The README indicates that the code is stable but may still develop, with potential for future changes. Some algorithms are listed as "Coming soon." The project does not include its own visualization tools, recommending https://github.com/vitchyr/viskit instead.

Health Check

Last Commit

5 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days