grade-school-math  by openai

Dataset for grade school math word problems

created 3 years ago
1,302 stars

Top 31.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the GSM8K dataset, a collection of 8,500 grade-school level math word problems designed to evaluate and improve the multi-step reasoning capabilities of large language models. It targets AI researchers and developers working on natural language understanding and mathematical reasoning in AI.

How It Works

GSM8K addresses LLM failures in multi-step mathematical reasoning by offering a curated dataset of linguistically diverse problems. Solutions are formatted with calculation annotations (e.g., <<50*3=150>>) that can be parsed by an external calculator, mitigating arithmetic errors common in LLMs. This approach allows models to offload calculations, improving accuracy on complex problems.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.7+, PyTorch, Transformers, OpenAI API key (for generating Socratic questions, not for dataset usage).
  • Data: Download train.jsonl and test.jsonl from grade_school_math/data/.
  • View Model Solutions: bash python view_model_solutions.py
  • Docs: Blog Post, Paper

Highlighted Details

  • Dataset comprises 8.5K problems, split into 7.5K training and 1K testing sets.
  • Problems require 2-8 steps involving basic arithmetic operations.
  • Includes an example implementation (calculator.py) for integrating a calculator during sampling.
  • Offers an optional "Socratic Dataset" with automatically generated subquestions for step-by-step reasoning.

Maintenance & Community

  • Status: Archived; no further updates expected.
  • Contributors: Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided model-generated samples may contain occasional calculation errors due to previous implementation bugs in the calculator, which have since been fixed in the codebase but not reflected in the samples. The example training and sampling code is for illustrative purposes and is noted as inefficient (no batching, no activation caching).

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
62 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.