grade-school-math by openai

Dataset for grade school math word problems

Created 4 years ago

1,379 stars

Top 29.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Research Scientist at Meta Superintelligence Lab

Project Summary

This repository provides the GSM8K dataset, a collection of 8,500 grade-school level math word problems designed to evaluate and improve the multi-step reasoning capabilities of large language models. It targets AI researchers and developers working on natural language understanding and mathematical reasoning in AI.

How It Works

GSM8K addresses LLM failures in multi-step mathematical reasoning by offering a curated dataset of linguistically diverse problems. Solutions are formatted with calculation annotations (e.g., <<50*3=150>>) that can be parsed by an external calculator, mitigating arithmetic errors common in LLMs. This approach allows models to offload calculations, improving accuracy on complex problems.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.7+, PyTorch, Transformers, OpenAI API key (for generating Socratic questions, not for dataset usage).
Data: Download train.jsonl and test.jsonl from grade_school_math/data/.
View Model Solutions: bash python view_model_solutions.py
Docs: Blog Post, Paper

Highlighted Details

Dataset comprises 8.5K problems, split into 7.5K training and 1K testing sets.
Problems require 2-8 steps involving basic arithmetic operations.
Includes an example implementation (calculator.py) for integrating a calculator during sampling.
Offers an optional "Socratic Dataset" with automatically generated subquestions for step-by-step reasoning.

Maintenance & Community

Status: Archived; no further updates expected.
Contributors: Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided model-generated samples may contain occasional calculation errors due to previous implementation bugs in the calculator, which have since been fixed in the codebase but not reflected in the samples. The example training and sampling code is for illustrative purposes and is noted as inefficient (no batching, no activation caching).

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days