compute-optimal-tts by RyanLiu112

Research paper code for compute-optimal test-time scaling of LLMs

Created 11 months ago

279 stars

Top 93.2% on SourcePulse

Project Summary

This repository provides the official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling." It enables researchers and practitioners to explore and implement test-time scaling strategies for Large Language Models (LLMs) in mathematical reasoning tasks, aiming to improve performance without retraining.

How It Works

The project implements several test-time scaling (TTS) methods, including Chain-of-Thought (CoT), Best-of-N (BoN), Beam Search, and Diverse Beam Search (DVTS). These methods leverage policy models (LLMs) and process reward models (PRMs) to enhance reasoning capabilities. The core idea is to scale computation at inference time by generating multiple candidate solutions and selecting the best one, thereby optimizing performance for a given compute budget.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda create -n tts python=3.10), activate it, and install dependencies (pip install -r requirements.txt, flash-attn, ray[default]==2.38.0, fschat[model_worker,webui], sympy==1.12, and the latex2sympy package).
Prerequisites: Python 3.10, CUDA (implied by flash-attn), tmux, and specific versions of ray and fschat. GPU configurations range from 1x A100 80GB to 4x A100 80GB depending on model sizes.
Resources: Requires significant GPU resources, with specific configurations detailed for various model sizes.
Links: Project Page, arXiv, HuggingFace.

Highlighted Details

Supports a wide range of LLMs, including Llama series (up to 3.1 8B), Qwen series (up to 72B), and DeepSeek-R1-Distill.
Integrates with various PRMs like Math-Shepherd, RLHFlow, Skywork, and Qwen2.5-Math.
Implements multiple TTS methods (CoT, BoN, Beam Search, DVTS) for flexible evaluation.
Codebase is largely based on OpenR, with mathematical evaluation code from Qwen2.5-Math.

Maintenance & Community

The project is associated with authors from multiple institutions and has received media coverage from QbitAI and AI Era. It is actively maintained, with code released in February 2025.

Licensing & Compatibility

The repository is released under the Apache-2.0 license, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The mathematical expression evaluation is based on Qwen2.5-Math; for more advanced evaluation, users are directed to the Math-Verify repository. The README notes that for BoN and DVTS, average results are not computed by default and require post-processing.

compute-optimal-tts by RyanLiu112

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

es-fine-tuning-paper by VsonicV

tiny-grpo by open-thought

yalm by andrewkchan

marlin by IST-DASLab

LLMSys-PaperList by AmberLJC

vidur by microsoft

guidellm by vllm-project

search-and-learn by huggingface

AgentBench by THUDM

intel-extension-for-pytorch by intel

MiniCPM by OpenBMB