Discover and explore top open-source AI tools and projects—updated daily.
simplescalingTest-time scaling recipe for strong reasoning performance
Top 7.7% on SourcePulse
This repository provides the artifacts for "s1: Simple test-time scaling," a method for enhancing Large Language Model (LLM) reasoning performance with minimal data. It targets researchers and practitioners seeking to improve LLM capabilities through efficient fine-tuning and inference techniques, offering strong reasoning performance with a small dataset and budget.
How It Works
The core innovation lies in "test-time scaling," a technique that involves fine-tuning an LLM on a small, curated dataset (s1K) of reasoning examples. This approach leverages budget forcing during inference, where the model's generation is constrained by a token limit for its "thinking" process. This encourages more focused and efficient reasoning, leading to improved accuracy on complex tasks.
Quick Start & Requirements
vLLM or transformers libraries.
vLLM example: pip install vllm transformers then run provided Python code.transformers example: pip install transformers torch then run provided Python code.pip3 install -r requirements.txt), and run bash train/sft.sh. Gradient checkpointing can be enabled for OOM issues.lm-evaluation-harness (cd eval/lm-evaluation-harness && pip install -e .[math,vllm]).Highlighted Details
s1.1-32B model and s1K-1.1 dataset using reasoning traces from r1.lm-evaluation-harness.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
vLLM can cause ValueError during budget forcing with specific token IDs; a workaround is suggested by uncommenting a line in vllm/engine/llm_engine.py.4 months ago
1 day
mlfoundations
NVIDIA
LiveCodeBench
tatsu-lab
allenai
tensorzero