TD3 by sfujim

PyTorch implementation of TD3 for OpenAI gym tasks

Created 7 years ago

2,014 stars

Top 21.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Nathan Lambert

Research Scientist at AI2

Phil Wang

Prolific Research Paper Implementer

Joshua Achiam

Head of Mission Alignment at OpenAI

Project Summary

This repository provides a PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3), an actor-critic algorithm designed to address function approximation errors in reinforcement learning. It is targeted at researchers and practitioners working with continuous control tasks, offering a robust baseline for benchmarking and experimentation.

How It Works

TD3 improves upon DDPG by introducing several key modifications: delayed policy updates, target policy smoothing, and clipped double Q-learning. These techniques collectively reduce the overestimation bias in Q-value estimates, leading to more stable and effective policy learning in complex environments. The implementation is built using PyTorch, leveraging its automatic differentiation and GPU acceleration capabilities.

Quick Start & Requirements

Primary install / run command: ./run_experiments.sh or python main.py --env HalfCheetah-v2
Prerequisites: PyTorch 1.2, Python 3.7, MuJoCo, OpenAI Gym.
Links: Learning Curves, Video

Highlighted Details

Implements TD3, DDPG, and includes scripts for reproducing paper results.
Tested on MuJoCo continuous control tasks.
Learning curves are provided as NumPy arrays, representing average rewards over 1 million time steps.

Maintenance & Community

The code is no longer exactly representative of the implementation used in the paper due to minor adjustments for performance.
Bibtex citation provided for the original paper.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The code is noted as being slightly different from the version used to generate the paper's results, with minor hyperparameter adjustments made for improved performance.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days