trajectory-transformer by jannerm

Offline RL research paper code release

Created 4 years ago

525 stars

Top 60.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chenlin Meng

Cofounder of Pika

Project Summary

This repository provides the code for the Trajectory Transformer, a model that frames offline reinforcement learning (RL) as a sequence modeling problem. It targets researchers and practitioners in RL seeking to leverage large language model architectures for decision-making tasks, offering state-of-the-art performance on several benchmarks.

How It Works

The Trajectory Transformer treats sequences of states, actions, and rewards as a single sequence, modeling the conditional distribution of future actions given past trajectories. It utilizes a GPT-like transformer architecture, enabling it to capture long-range dependencies within the data. This approach allows for effective "planning" by conditioning the model on a desired return and generating a sequence of actions that are likely to achieve it.

Quick Start & Requirements

Install dependencies via conda env create -f environment.yml and conda activate trajectory, then pip install -e ..
Requires MuJoCo and a MuJoCo key.
Pretrained models for 16 datasets are available via ./pretrained.sh.
Official documentation and paper reference: https://janner.github.io/trajectory-transformer/

Highlighted Details

Achieves state-of-the-art results on D4RL benchmarks for offline RL.
Enables "planning" by conditioning on desired returns.
GPT implementation is based on Andrej Karpathy's minGPT.
Includes scripts for training, planning, plotting results, and Docker deployment.

Maintenance & Community

The project is associated with Michael Janner and Sergey Levine. The GPT implementation is derived from minGPT. A fork with attention caching and vectorized rollouts is also mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that some hyperparameters differ from the paper due to discretization strategy changes, with plans to update the paper. The lack of an explicit license is a significant caveat for adoption.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days