trajectory-transformer  by jannerm

Offline RL research paper code release

Created 4 years ago
516 stars

Top 60.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the code for the Trajectory Transformer, a model that frames offline reinforcement learning (RL) as a sequence modeling problem. It targets researchers and practitioners in RL seeking to leverage large language model architectures for decision-making tasks, offering state-of-the-art performance on several benchmarks.

How It Works

The Trajectory Transformer treats sequences of states, actions, and rewards as a single sequence, modeling the conditional distribution of future actions given past trajectories. It utilizes a GPT-like transformer architecture, enabling it to capture long-range dependencies within the data. This approach allows for effective "planning" by conditioning the model on a desired return and generating a sequence of actions that are likely to achieve it.

Quick Start & Requirements

  • Install dependencies via conda env create -f environment.yml and conda activate trajectory, then pip install -e ..
  • Requires MuJoCo and a MuJoCo key.
  • Pretrained models for 16 datasets are available via ./pretrained.sh.
  • Official documentation and paper reference: https://janner.github.io/trajectory-transformer/

Highlighted Details

  • Achieves state-of-the-art results on D4RL benchmarks for offline RL.
  • Enables "planning" by conditioning on desired returns.
  • GPT implementation is based on Andrej Karpathy's minGPT.
  • Includes scripts for training, planning, plotting results, and Docker deployment.

Maintenance & Community

The project is associated with Michael Janner and Sergey Levine. The GPT implementation is derived from minGPT. A fork with attention caching and vectorized rollouts is also mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that some hyperparameters differ from the paper due to discretization strategy changes, with plans to update the paper. The lack of an explicit license is a significant caveat for adoption.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Bryan Helmig Bryan Helmig(Cofounder of Zapier), Will Brown Will Brown(Research Lead at Prime Intellect), and
1 more.

ReCall by Agent-RL

1.2%
1k
RL framework for LLM tool use
Created 6 months ago
Updated 4 months ago
Feedback? Help us improve.