Offline RL research paper code release
Top 62.1% on sourcepulse
This repository provides the code for the Trajectory Transformer, a model that frames offline reinforcement learning (RL) as a sequence modeling problem. It targets researchers and practitioners in RL seeking to leverage large language model architectures for decision-making tasks, offering state-of-the-art performance on several benchmarks.
How It Works
The Trajectory Transformer treats sequences of states, actions, and rewards as a single sequence, modeling the conditional distribution of future actions given past trajectories. It utilizes a GPT-like transformer architecture, enabling it to capture long-range dependencies within the data. This approach allows for effective "planning" by conditioning the model on a desired return and generating a sequence of actions that are likely to achieve it.
Quick Start & Requirements
conda env create -f environment.yml
and conda activate trajectory
, then pip install -e .
../pretrained.sh
.Highlighted Details
Maintenance & Community
The project is associated with Michael Janner and Sergey Levine. The GPT implementation is derived from minGPT. A fork with attention caching and vectorized rollouts is also mentioned.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
The README notes that some hyperparameters differ from the paper due to discretization strategy changes, with plans to update the paper. The lack of an explicit license is a significant caveat for adoption.
2 years ago
1 week