Qwen2.5-Math  by QwenLM

Math LLM for solving math problems in Chinese and English

created 1 year ago
977 stars

Top 38.6% on sourcepulse

GitHubView on GitHub
Project Summary

Qwen2.5-Math is a series of large language models specifically designed for solving mathematical problems in both English and Chinese. Targeting researchers and developers working on mathematical AI, it offers significant performance improvements over its predecessor by supporting Chain-of-Thought (CoT) and Tool-integrated Reasoning (TIR) methods.

How It Works

Qwen2.5-Math models leverage advanced reasoning techniques, including CoT for step-by-step problem-solving and TIR for integrating external tools like code interpreters. This dual approach allows for more robust and accurate solutions to complex mathematical tasks, outperforming previous models on various benchmarks. The models are available in base and instruction-tuned variants, with a dedicated reward model (RM) for enhanced performance.

Quick Start & Requirements

  • Installation: Use Hugging Face transformers library (version >= 4.37.0).
  • Dependencies: transformers, torch, vllm (for evaluation). Specific versions are crucial for reproducing results.
  • Resources: GPU memory requirements are comparable to Qwen2. See speed benchmark.
  • Documentation: Qwen Chat, Documentation.

Highlighted Details

  • Supports both English and Chinese mathematical problem-solving.
  • Achieves state-of-the-art performance on benchmarks like GSM8K, MATH, and GaoKao Math QA.
  • Qwen2.5-Math-72B-Instruct demonstrates strong capabilities on challenging exams like AIME 2024 and AMC 2023.
  • Offers a dedicated Qwen2.5-Math-RM-72B reward model for further performance tuning.

Maintenance & Community

  • Developed by the Qwen team.
  • Community support available via Discord and WeChat.

Licensing & Compatibility

  • The specific license is not detailed in the README, but typically Qwen models are released under a permissive license allowing commercial use. Compatibility with closed-source linking is generally expected.

Limitations & Caveats

  • Primarily designed for mathematical tasks; performance on other tasks is not recommended.
  • Reproducing evaluation results requires strict adherence to specified dependency versions.
Health Check
Last commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
3
Star History
69 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
created 6 months ago
updated 3 months ago
Feedback? Help us improve.