discover by test-time-training

Learning to discover at test time

Created 5 months ago

600 stars

Top 53.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Wing Lian

Founder of Axolotl AI

Project Summary

Summary

TTT-Discover introduces a novel approach to enhance Large Language Models (LLMs) by performing reinforcement learning (RL) at test time. This allows models to adapt and train on experience specific to the problem at hand, achieving new state-of-the-art results across challenging domains like mathematics, GPU kernel engineering, algorithm design, and biological data processing. It targets researchers and engineers seeking to push LLM capabilities beyond pre-training.

How It Works

The core innovation lies in applying RL during the inference or testing phase. Instead of relying solely on pre-trained knowledge, TTT-Discover enables the LLM to learn from its interactions and outcomes within a specific task context. This adaptive learning process, leveraging frameworks like Tinker for RL recipes, allows for fine-tuned performance improvements on novel or complex problems where general pre-training might fall short.

Quick Start & Requirements

Installation involves pip install -r requirements/requirements-math.txt, with additional requirements files available for GPU kernels (requirements-gpumode.txt), AtCoder (requirements-ale.txt), and denoising (requirements-denoising.txt). Environment variables TINKER_API_KEY, WANDB_API_KEY, and WANDB_ENTITY must be set. Launching jobs requires SLURM. A sample command is provided: python main_tinker_submitit.py --nodes 4 --partition default --cpus-per-task 100 env=ac1 model_name="openai/gpt-oss-120b" sampler_type=puct_backprop initial_exp_type=random num_epochs=50 wandb_project="my-project" wandb_name="ac1-run-1". Further details are in docs/launching.md and docs/intro.md.

Highlighted Details

Mathematics: Achieved state-of-the-art Erdős Overlap score of 0.380876, surpassing previous AI bests.
Kernel Engineering: Set new benchmarks for GPU kernel TriMul performance on A100 (2198 μs) and H100 (1161 μs) GPUs, outperforming human bests.
Algorithm Engineering: Established state-of-the-art on AtCoder AHC39 (Geometry) with 567,062 points.
Biology: Demonstrated superior performance in single-cell RNA-seq denoising, achieving 0.71 on PBMC and 0.73 on Tabula benchmarks.

Maintenance & Community

The project is under active development, with an upcoming refactor and API simplification announced. While specific community channels are not listed, acknowledgments point to contributions and inspirations from projects like Tinker, ALE-Bench, AlphaEvolve, and OpenEvolve.

Licensing & Compatibility

The project is licensed under the MIT License, which permits broad use, including commercial applications, with minimal restrictions.

Limitations & Caveats

The project is currently in a "transition period" with an impending API refactor, suggesting potential instability or breaking changes in the current codebase. Job execution relies on a SLURM cluster environment.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days