TD3_BC  by sfujim

PyTorch implementation of TD3+BC, an offline RL method

Created 4 years ago
373 stars

Top 76.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a minimalist PyTorch implementation of TD3+BC, a simple yet effective variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for offline reinforcement learning. It is designed for researchers and practitioners seeking a straightforward approach to offline RL without complex architectural changes or hyperparameter tuning.

How It Works

TD3+BC enhances the standard TD3 algorithm with two key modifications: a weighted behavior cloning loss is incorporated into the policy update, and states are normalized. This approach aims to leverage the stability of TD3 while incorporating the benefits of behavior cloning for improved performance in offline settings, requiring no changes to the underlying network architecture or hyperparameters.

Quick Start & Requirements

  • Primary install/run command: ./run_experiments.sh
  • Prerequisites: MuJoCo 1.50, mujoco-py 1.50.1.1, OpenAI gym 0.17.0, PyTorch 1.4.0, Python 3.6.
  • Datasets: D4RL datasets.

Highlighted Details

  • Reproduces paper results using the provided script.
  • Focuses on a minimalist approach with minimal algorithmic changes.
  • Implements TD3 with a weighted behavior cloning loss and state normalization.

Maintenance & Community

  • Developed by Scott Fujimoto and Shixiang Shane Gu.
  • The project is associated with NeurIPS 2021.

Licensing & Compatibility

  • The repository is marked as "not an official Google product."
  • Licensing details are not explicitly stated in the README.

Limitations & Caveats

The implementation is tied to specific older versions of dependencies (MuJoCo 1.50, mujoco-py 1.50.1.1, OpenAI gym 0.17.0, PyTorch 1.4.0, Python 3.6), which may pose challenges for users with newer environments.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Nathan Lambert Nathan Lambert(Research Scientist at AI2), Phil Wang Phil Wang(Prolific Research Paper Implementer), and
1 more.

TD3 by sfujim

0.3%
2k
PyTorch implementation of TD3 for OpenAI gym tasks
Created 7 years ago
Updated 2 years ago
Feedback? Help us improve.