trajectory-transformer  by jannerm

Offline RL research paper code release

created 3 years ago
509 stars

Top 62.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the code for the Trajectory Transformer, a model that frames offline reinforcement learning (RL) as a sequence modeling problem. It targets researchers and practitioners in RL seeking to leverage large language model architectures for decision-making tasks, offering state-of-the-art performance on several benchmarks.

How It Works

The Trajectory Transformer treats sequences of states, actions, and rewards as a single sequence, modeling the conditional distribution of future actions given past trajectories. It utilizes a GPT-like transformer architecture, enabling it to capture long-range dependencies within the data. This approach allows for effective "planning" by conditioning the model on a desired return and generating a sequence of actions that are likely to achieve it.

Quick Start & Requirements

  • Install dependencies via conda env create -f environment.yml and conda activate trajectory, then pip install -e ..
  • Requires MuJoCo and a MuJoCo key.
  • Pretrained models for 16 datasets are available via ./pretrained.sh.
  • Official documentation and paper reference: https://janner.github.io/trajectory-transformer/

Highlighted Details

  • Achieves state-of-the-art results on D4RL benchmarks for offline RL.
  • Enables "planning" by conditioning on desired returns.
  • GPT implementation is based on Andrej Karpathy's minGPT.
  • Includes scripts for training, planning, plotting results, and Docker deployment.

Maintenance & Community

The project is associated with Michael Janner and Sergey Levine. The GPT implementation is derived from minGPT. A fork with attention caching and vectorized rollouts is also mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that some hyperparameters differ from the paper due to discretization strategy changes, with plans to update the paper. The lack of an explicit license is a significant caveat for adoption.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.