DeepMath  by zwhe99

Dataset for advancing LLM mathematical reasoning

Created 5 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

DeepMath provides DeepMath-103K, a large-scale, challenging, decontaminated, and verifiable mathematical dataset designed to advance reasoning in language models. It targets researchers and practitioners in AI mathematics, particularly those using Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT), offering a robust benchmark for evaluating and improving model capabilities.

How It Works

The core is the DeepMath-103K dataset, featuring difficult problems (Levels 5-9) across diverse mathematical subjects like Algebra, Calculus, and Number Theory. Its advantages include rigorous decontamination against common benchmarks to minimize test set leakage and a rich data format. This format includes verifiable final answers crucial for RL reward functions, difficulty scores, hierarchical topic classifications, and multiple reasoning paths for SFT or distillation, ensuring data novelty and supporting varied research applications.

Quick Start & Requirements

Setup involves cloning the repository (git clone --recurse-submodules), creating a Python 3.12.2 Conda environment, and installing numerous packages including PyTorch 2.5.1 with CUDA 12.4, flash-attn, vllm, and Ray. Significant GPU resources are implied. Key resources include the dataset on Hugging Face, model weights, the code repository, and the accompanying paper. Data preparation scripts and evaluation examples are available within the repository.

Highlighted Details

  • Scale & Complexity: DeepMath-103K dataset (>103K problems) emphasizes difficult math tasks (Levels 5-9).
  • Data Integrity: Novel problems with rigorous decontamination to prevent test set leakage.
  • Verifiable Answers: Enables robust RL reward functions.
  • Performance: DeepMath models achieve State-of-the-Art (SOTA) on math benchmarks.
  • Model Availability: Pre-trained weights like DeepMath-Zero-7B are provided.

Maintenance & Community

The project appears actively maintained, with recent news indicating updates to the dataset. However, the README lacks explicit community channel links or detailed contributor/sponsorship information.

Licensing & Compatibility

The README does not specify a software license. While hosted on GitHub and Hugging Face, suggesting open-source availability, users should verify terms for commercial use or integration into closed-source projects.

Limitations & Caveats

Recently, 48 samples with answer hints were identified and revised, highlighting potential data integrity issues that have since been addressed. The extensive, version-specific dependencies may complicate setup.


https://huggingface.co/datasets/zwhe99/DeepMath-103K https://huggingface.co/collections/zwhe99/deepmath-6816e139b7f467f21a459a9a https://github.com/zwhe99/DeepMath https://arxiv.org/abs/2504.11456

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.