DeepMath by zwhe99

Dataset for advancing LLM mathematical reasoning

Created 10 months ago

282 stars

Top 92.7% on SourcePulse

Project Summary

DeepMath provides DeepMath-103K, a large-scale, challenging, decontaminated, and verifiable mathematical dataset designed to advance reasoning in language models. It targets researchers and practitioners in AI mathematics, particularly those using Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT), offering a robust benchmark for evaluating and improving model capabilities.

How It Works

The core is the DeepMath-103K dataset, featuring difficult problems (Levels 5-9) across diverse mathematical subjects like Algebra, Calculus, and Number Theory. Its advantages include rigorous decontamination against common benchmarks to minimize test set leakage and a rich data format. This format includes verifiable final answers crucial for RL reward functions, difficulty scores, hierarchical topic classifications, and multiple reasoning paths for SFT or distillation, ensuring data novelty and supporting varied research applications.

Quick Start & Requirements

Setup involves cloning the repository (git clone --recurse-submodules), creating a Python 3.12.2 Conda environment, and installing numerous packages including PyTorch 2.5.1 with CUDA 12.4, flash-attn, vllm, and Ray. Significant GPU resources are implied. Key resources include the dataset on Hugging Face, model weights, the code repository, and the accompanying paper. Data preparation scripts and evaluation examples are available within the repository.

Highlighted Details

Scale & Complexity: DeepMath-103K dataset (>103K problems) emphasizes difficult math tasks (Levels 5-9).
Data Integrity: Novel problems with rigorous decontamination to prevent test set leakage.
Verifiable Answers: Enables robust RL reward functions.
Performance: DeepMath models achieve State-of-the-Art (SOTA) on math benchmarks.
Model Availability: Pre-trained weights like DeepMath-Zero-7B are provided.

Maintenance & Community

The project appears actively maintained, with recent news indicating updates to the dataset. However, the README lacks explicit community channel links or detailed contributor/sponsorship information.

Licensing & Compatibility

The README does not specify a software license. While hosted on GitHub and Hugging Face, suggesting open-source availability, users should verify terms for commercial use or integration into closed-source projects.

Limitations & Caveats

Recently, 48 samples with answer hints were identified and revised, highlighting potential data integrity issues that have since been addressed. The extensive, version-specific dependencies may complicate setup.

DeepMath by zwhe99

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

gsm8k-ScRel by OFA-Sys

MathCoder by mathllm

dl4math by lupantech

Awesome-Efficient-Reasoning-LLMs by Eclipsess

MetaMath by meta-math

reasoning-with-sampling by aakaran

LIMO by GAIR-NLP

ToRA by microsoft

MathBlackBox by trotsky1997

llm-datasets by mlabonne

TinyRecursiveModels by SamsungSAILMontreal