TD3_BC  by sfujim

PyTorch implementation of TD3+BC, an offline RL method

created 4 years ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a minimalist PyTorch implementation of TD3+BC, a simple yet effective variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for offline reinforcement learning. It is designed for researchers and practitioners seeking a straightforward approach to offline RL without complex architectural changes or hyperparameter tuning.

How It Works

TD3+BC enhances the standard TD3 algorithm with two key modifications: a weighted behavior cloning loss is incorporated into the policy update, and states are normalized. This approach aims to leverage the stability of TD3 while incorporating the benefits of behavior cloning for improved performance in offline settings, requiring no changes to the underlying network architecture or hyperparameters.

Quick Start & Requirements

  • Primary install/run command: ./run_experiments.sh
  • Prerequisites: MuJoCo 1.50, mujoco-py 1.50.1.1, OpenAI gym 0.17.0, PyTorch 1.4.0, Python 3.6.
  • Datasets: D4RL datasets.

Highlighted Details

  • Reproduces paper results using the provided script.
  • Focuses on a minimalist approach with minimal algorithmic changes.
  • Implements TD3 with a weighted behavior cloning loss and state normalization.

Maintenance & Community

  • Developed by Scott Fujimoto and Shixiang Shane Gu.
  • The project is associated with NeurIPS 2021.

Licensing & Compatibility

  • The repository is marked as "not an official Google product."
  • Licensing details are not explicitly stated in the README.

Limitations & Caveats

The implementation is tied to specific older versions of dependencies (MuJoCo 1.50, mujoco-py 1.50.1.1, OpenAI gym 0.17.0, PyTorch 1.4.0, Python 3.6), which may pose challenges for users with newer environments.

Health Check
Last commit

3 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.