TD3  by sfujim

PyTorch implementation of TD3 for OpenAI gym tasks

Created 7 years ago
1,958 stars

Top 22.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3), an actor-critic algorithm designed to address function approximation errors in reinforcement learning. It is targeted at researchers and practitioners working with continuous control tasks, offering a robust baseline for benchmarking and experimentation.

How It Works

TD3 improves upon DDPG by introducing several key modifications: delayed policy updates, target policy smoothing, and clipped double Q-learning. These techniques collectively reduce the overestimation bias in Q-value estimates, leading to more stable and effective policy learning in complex environments. The implementation is built using PyTorch, leveraging its automatic differentiation and GPU acceleration capabilities.

Quick Start & Requirements

  • Primary install / run command: ./run_experiments.sh or python main.py --env HalfCheetah-v2
  • Prerequisites: PyTorch 1.2, Python 3.7, MuJoCo, OpenAI Gym.
  • Links: Learning Curves, Video

Highlighted Details

  • Implements TD3, DDPG, and includes scripts for reproducing paper results.
  • Tested on MuJoCo continuous control tasks.
  • Learning curves are provided as NumPy arrays, representing average rewards over 1 million time steps.

Maintenance & Community

  • The code is no longer exactly representative of the implementation used in the paper due to minor adjustments for performance.
  • Bibtex citation provided for the original paper.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

The code is noted as being slightly different from the version used to generate the paper's results, with minor hyperparameter adjustments made for improved performance.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Philipp Moritz Philipp Moritz(Cofounder of Anyscale), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
1 more.

ARS by modestyachts

0.2%
425
Reinforcement learning via augmented random search
Created 7 years ago
Updated 4 years ago
Feedback? Help us improve.