PyTorch implementation of TD3 for OpenAI gym tasks
Top 23.2% on sourcepulse
This repository provides a PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3), an actor-critic algorithm designed to address function approximation errors in reinforcement learning. It is targeted at researchers and practitioners working with continuous control tasks, offering a robust baseline for benchmarking and experimentation.
How It Works
TD3 improves upon DDPG by introducing several key modifications: delayed policy updates, target policy smoothing, and clipped double Q-learning. These techniques collectively reduce the overestimation bias in Q-value estimates, leading to more stable and effective policy learning in complex environments. The implementation is built using PyTorch, leveraging its automatic differentiation and GPU acceleration capabilities.
Quick Start & Requirements
./run_experiments.sh
or python main.py --env HalfCheetah-v2
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The code is noted as being slightly different from the version used to generate the paper's results, with minor hyperparameter adjustments made for improved performance.
2 years ago
1+ week