PyTorch implementation of TD3+BC, an offline RL method
Top 78.1% on sourcepulse
This repository provides a minimalist PyTorch implementation of TD3+BC, a simple yet effective variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for offline reinforcement learning. It is designed for researchers and practitioners seeking a straightforward approach to offline RL without complex architectural changes or hyperparameter tuning.
How It Works
TD3+BC enhances the standard TD3 algorithm with two key modifications: a weighted behavior cloning loss is incorporated into the policy update, and states are normalized. This approach aims to leverage the stability of TD3 while incorporating the benefits of behavior cloning for improved performance in offline settings, requiring no changes to the underlying network architecture or hyperparameters.
Quick Start & Requirements
./run_experiments.sh
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The implementation is tied to specific older versions of dependencies (MuJoCo 1.50, mujoco-py 1.50.1.1, OpenAI gym 0.17.0, PyTorch 1.4.0, Python 3.6), which may pose challenges for users with newer environments.
3 years ago
1+ week