compute-optimal-tts  by RyanLiu112

Research paper code for compute-optimal test-time scaling of LLMs

Created 7 months ago
271 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling." It enables researchers and practitioners to explore and implement test-time scaling strategies for Large Language Models (LLMs) in mathematical reasoning tasks, aiming to improve performance without retraining.

How It Works

The project implements several test-time scaling (TTS) methods, including Chain-of-Thought (CoT), Best-of-N (BoN), Beam Search, and Diverse Beam Search (DVTS). These methods leverage policy models (LLMs) and process reward models (PRMs) to enhance reasoning capabilities. The core idea is to scale computation at inference time by generating multiple candidate solutions and selecting the best one, thereby optimizing performance for a given compute budget.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create -n tts python=3.10), activate it, and install dependencies (pip install -r requirements.txt, flash-attn, ray[default]==2.38.0, fschat[model_worker,webui], sympy==1.12, and the latex2sympy package).
  • Prerequisites: Python 3.10, CUDA (implied by flash-attn), tmux, and specific versions of ray and fschat. GPU configurations range from 1x A100 80GB to 4x A100 80GB depending on model sizes.
  • Resources: Requires significant GPU resources, with specific configurations detailed for various model sizes.
  • Links: Project Page, arXiv, HuggingFace.

Highlighted Details

  • Supports a wide range of LLMs, including Llama series (up to 3.1 8B), Qwen series (up to 72B), and DeepSeek-R1-Distill.
  • Integrates with various PRMs like Math-Shepherd, RLHFlow, Skywork, and Qwen2.5-Math.
  • Implements multiple TTS methods (CoT, BoN, Beam Search, DVTS) for flexible evaluation.
  • Codebase is largely based on OpenR, with mathematical evaluation code from Qwen2.5-Math.

Maintenance & Community

The project is associated with authors from multiple institutions and has received media coverage from QbitAI and AI Era. It is actively maintained, with code released in February 2025.

Licensing & Compatibility

The repository is released under the Apache-2.0 license, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The mathematical expression evaluation is based on Qwen2.5-Math; for more advanced evaluation, users are directed to the Math-Verify repository. The README notes that for BoN and DVTS, average results are not computed by default and require post-processing.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.4%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.