compute-optimal-tts  by RyanLiu112

Research paper code for compute-optimal test-time scaling of LLMs

created 5 months ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling." It enables researchers and practitioners to explore and implement test-time scaling strategies for Large Language Models (LLMs) in mathematical reasoning tasks, aiming to improve performance without retraining.

How It Works

The project implements several test-time scaling (TTS) methods, including Chain-of-Thought (CoT), Best-of-N (BoN), Beam Search, and Diverse Beam Search (DVTS). These methods leverage policy models (LLMs) and process reward models (PRMs) to enhance reasoning capabilities. The core idea is to scale computation at inference time by generating multiple candidate solutions and selecting the best one, thereby optimizing performance for a given compute budget.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create -n tts python=3.10), activate it, and install dependencies (pip install -r requirements.txt, flash-attn, ray[default]==2.38.0, fschat[model_worker,webui], sympy==1.12, and the latex2sympy package).
  • Prerequisites: Python 3.10, CUDA (implied by flash-attn), tmux, and specific versions of ray and fschat. GPU configurations range from 1x A100 80GB to 4x A100 80GB depending on model sizes.
  • Resources: Requires significant GPU resources, with specific configurations detailed for various model sizes.
  • Links: Project Page, arXiv, HuggingFace.

Highlighted Details

  • Supports a wide range of LLMs, including Llama series (up to 3.1 8B), Qwen series (up to 72B), and DeepSeek-R1-Distill.
  • Integrates with various PRMs like Math-Shepherd, RLHFlow, Skywork, and Qwen2.5-Math.
  • Implements multiple TTS methods (CoT, BoN, Beam Search, DVTS) for flexible evaluation.
  • Codebase is largely based on OpenR, with mathematical evaluation code from Qwen2.5-Math.

Maintenance & Community

The project is associated with authors from multiple institutions and has received media coverage from QbitAI and AI Era. It is actively maintained, with code released in February 2025.

Licensing & Compatibility

The repository is released under the Apache-2.0 license, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The mathematical expression evaluation is based on Qwen2.5-Math; for more advanced evaluation, users are directed to the Math-Verify repository. The README notes that for BoN and DVTS, average results are not computed by default and require post-processing.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

llm-analysis by cli99

0%
441
CLI tool for LLM latency/memory analysis during training/inference
created 2 years ago
updated 3 months ago
Feedback? Help us improve.