discover  by test-time-training

Learning to discover at test time

Created 1 month ago
459 stars

Top 65.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

TTT-Discover introduces a novel approach to enhance Large Language Models (LLMs) by performing reinforcement learning (RL) at test time. This allows models to adapt and train on experience specific to the problem at hand, achieving new state-of-the-art results across challenging domains like mathematics, GPU kernel engineering, algorithm design, and biological data processing. It targets researchers and engineers seeking to push LLM capabilities beyond pre-training.

How It Works

The core innovation lies in applying RL during the inference or testing phase. Instead of relying solely on pre-trained knowledge, TTT-Discover enables the LLM to learn from its interactions and outcomes within a specific task context. This adaptive learning process, leveraging frameworks like Tinker for RL recipes, allows for fine-tuned performance improvements on novel or complex problems where general pre-training might fall short.

Quick Start & Requirements

Installation involves pip install -r requirements/requirements-math.txt, with additional requirements files available for GPU kernels (requirements-gpumode.txt), AtCoder (requirements-ale.txt), and denoising (requirements-denoising.txt). Environment variables TINKER_API_KEY, WANDB_API_KEY, and WANDB_ENTITY must be set. Launching jobs requires SLURM. A sample command is provided: python main_tinker_submitit.py --nodes 4 --partition default --cpus-per-task 100 env=ac1 model_name="openai/gpt-oss-120b" sampler_type=puct_backprop initial_exp_type=random num_epochs=50 wandb_project="my-project" wandb_name="ac1-run-1". Further details are in docs/launching.md and docs/intro.md.

Highlighted Details

  • Mathematics: Achieved state-of-the-art Erdős Overlap score of 0.380876, surpassing previous AI bests.
  • Kernel Engineering: Set new benchmarks for GPU kernel TriMul performance on A100 (2198 μs) and H100 (1161 μs) GPUs, outperforming human bests.
  • Algorithm Engineering: Established state-of-the-art on AtCoder AHC39 (Geometry) with 567,062 points.
  • Biology: Demonstrated superior performance in single-cell RNA-seq denoising, achieving 0.71 on PBMC and 0.73 on Tabula benchmarks.

Maintenance & Community

The project is under active development, with an upcoming refactor and API simplification announced. While specific community channels are not listed, acknowledgments point to contributions and inspirations from projects like Tinker, ALE-Bench, AlphaEvolve, and OpenEvolve.

Licensing & Compatibility

The project is licensed under the MIT License, which permits broad use, including commercial applications, with minimal restrictions.

Limitations & Caveats

The project is currently in a "transition period" with an impending API refactor, suggesting potential instability or breaking changes in the current codebase. Job execution relies on a SLURM cluster environment.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
232 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

LMaaS-Papers by txsun1997

0%
544
Curated list of LMaaS research papers
Created 3 years ago
Updated 1 year ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Michael Han Michael Han(Cofounder of Unsloth), and
18 more.

llm-course by mlabonne

0.5%
76k
LLM course with roadmaps and notebooks
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.