ttt-lm-jax by test-time-training

JAX implementation of test-time training RNN research paper

Created 1 year ago

445 stars

Top 67.3% on SourcePulse

View on GitHub

4 Experts Love This Project

Alex Gajewski

Cofounder of San Francisco Compute Company

Chaoyu Yang

Founder of Bento

Lianmin Zheng

Coauthor of SGLang, vLLM

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

This repository provides the official JAX implementation for "Learning to (Learn at Test Time): RNNs with Expressive Hidden States." It addresses the limitations of traditional RNNs in long-context modeling by introducing a novel "Test-Time Training" (TTT) layer where the hidden state is a self-supervised learning model itself. This allows for linear complexity with expressive hidden states, benefiting researchers and practitioners working with long sequence data.

How It Works

The core innovation lies in the TTT layers (TTT-Linear and TTT-MLP), which replace standard RNN hidden states. Instead of a fixed representation, the hidden state is a trainable machine learning model (a linear model or a two-layer MLP). The update rule for this hidden state is a step of self-supervised learning, allowing it to adapt and learn from test sequences. This approach aims to achieve the linear complexity of RNNs while overcoming their expressive power limitations in long contexts.

Quick Start & Requirements

Installation: Install GPU requirements via pip install -r requirements/gpu_requirements.txt. For TPU, use pip install -r requirements/tpu_requirements.txt.
Prerequisites: Python 3.11, JAX, WandB for logging. Datasets (Llama-2 tokenized) need to be downloaded from Google Cloud Buckets.
Setup: Requires downloading large datasets and configuring dataset_path.
Links: Paper, PyTorch Codebase, Model Docs

Highlighted Details

Official JAX implementation of TTT layers.
Achieves linear complexity with expressive hidden states for long-context modeling.
Based on EasyLM and FlashAttention dataloader.
Scripts provided for replicating paper experiments.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The repository requires significant setup, including downloading large datasets and potentially configuring model sharding for larger models. The lack of explicit licensing information and community channels may pose adoption challenges.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days