pg-is-all-you-need by MrSyee

Step-by-step tutorial for policy gradient algorithms

Created 6 years ago

987 stars

Top 37.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

This repository provides a step-by-step tutorial for various Policy Gradient (PG) reinforcement learning algorithms, including A2C, PPO, DDPG, TD3, and SAC, with extensions for learning from demonstrations. It targets researchers and practitioners seeking to understand and implement these methods, offering both theoretical explanations and object-oriented code examples executable in Colab.

How It Works

The project implements well-known PG algorithms with a focus on clear, object-oriented code. Each chapter covers theoretical background and provides runnable code, allowing users to pick specific topics. The implementation is designed for ease of understanding and direct execution, facilitating learning and experimentation with reinforcement learning techniques.

Quick Start & Requirements

Install via make dep after cloning the repository.
Prerequisites: Python 3.6.1+, Anaconda virtual environment recommended.
Links: Colab Notebooks, NBViewer

Highlighted Details

Covers a range of PG algorithms from A2C to SAC.
Includes methods for learning acceleration using demonstrations (DDPGfD, Behavior Cloning).
Provides both theoretical background and object-oriented implementations.
Executable directly in Google Colab, accessible even on mobile devices.

Maintenance & Community

The project lists several contributors and welcomes issues and pull requests for improvements.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The project is tested on Python 3.6.1+, and compatibility with newer Python versions is not guaranteed. The specific license is not mentioned, which may impact commercial use.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days