VibeThinker  by WeiboAI

Small model, big logic: diversity-driven optimization for advanced reasoning

Created 3 weeks ago

New!

519 stars

Top 60.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

VibeThinker-1.5B is a 1.5 billion-parameter model demonstrating that small models can achieve robust reasoning capabilities. It targets engineers and researchers seeking highly efficient, cost-effective reasoning solutions. The project delivers state-of-the-art performance in mathematical and coding tasks with a fraction of the parameters and training cost of leading models.

How It Works

The model uses the "Spectrum-to-Signal Principle (SSP)" post-training methodology. This involves "Two-Stage Diversity-Exploring Distillation" in SFT to generate diverse solutions, followed by "MaxEnt-Guided Policy Optimization (MGPO)" in RL to amplify correct reasoning signals. This approach enhances logical deduction abilities in a compact model.

Quick Start & Requirements

Requires transformers>=4.54.0; vLLM==0.10.1 or SGLang>=0.4.9.post6 recommended for inference. Model checkpoints are available via Hugging Face and ModelScope. Evaluation programs and sample responses are prepared. Recommended usage parameters: temperature 0.6/1.0, max token length 40960, top_p 0.95, top_k -1 (for vLLM/SGLang). Direct links to these resources are not provided in the README.

Highlighted Details

  • Ultra-Efficient: 1.5B parameters, 100x-600x smaller than models like Kimi K2 (1000B+) and DeepSeek R1 (671B).
  • Superior Performance: Outperforms DeepSeek R1 on AIME24, AIME25, HMMT25 benchmarks; rivals MiniMax-M1; surpasses Magistral Medium and Claude Opus 4.
  • Cost-Effective Development: Post-training cost of $7,800, a 30x-60x reduction from DeepSeek R1 ($294K) and MiniMax-M1 ($535K).

Maintenance & Community

The README does not specify community channels or provide details on ongoing maintenance or active contributors beyond the paper authors.

Licensing & Compatibility

Licensed under the MIT License, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The model is explicitly recommended for competitive math and coding problems. Its performance or limitations on other task types are not detailed. No information is provided on known bugs or unsupported platforms.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
9
Star History
523 stars in the last 27 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
92k
Reasoning models research paper
Created 10 months ago
Updated 5 months ago
Feedback? Help us improve.