Program-of-Thoughts  by TIGER-AI-Lab

Research paper code for numerical reasoning via Program of Thoughts

Created 2 years ago
285 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the data and code for the "Program of Thoughts" (PoT) prompting technique, designed to improve numerical reasoning in Large Language Models (LLMs). PoT disentangles computation from reasoning by having the LLM generate Python code for calculations, which is then executed by an external interpreter, achieving state-of-the-art results on various math word problem benchmarks.

How It Works

PoT prompts LLMs to express their reasoning steps as executable Python code. This approach separates the symbolic reasoning process from the numerical computation. The LLM generates a sequence of Python statements that represent its thought process, and these statements are then executed by a standard Python interpreter. This allows the LLM to leverage the accuracy and robustness of a dedicated computation engine, leading to improved performance on tasks requiring precise calculations.

Quick Start & Requirements

  • Install/Run: Set OPENAI_KEY environment variable. Run scripts like python run_gsm8k.py for greedy decoding or python run_gsm8k.py (without --greedy) for self-consistency.
  • Prerequisites: OpenAI API key.
  • Evaluation: Navigate to the outputs/ directory and run python compute_score.py --inputs <your_prediction_file.jsonl>.
  • Links: TMLR 2023 Paper (implied by description).

Highlighted Details

  • Outperforms few-shot Chain-of-Thought (CoT) by an average of 12% on evaluated datasets.
  • Achieves state-of-the-art performance with self-consistency decoding on GSM8K, AQuA, SVAMP, TabMWP, and MultiArith.
  • Includes evaluation scripts and detailed results for multiple datasets (GSM8K, AQuA, SVAMP, TabMWP, FinQA, ConvFinQA, TATQA, MultiArith).
  • Supports both greedy decoding and self-consistency decoding methods.

Maintenance & Community

The project is associated with the TMLR 2023 paper "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" by Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. No specific community channels or active maintenance indicators are present in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the nature of the code and its association with academic research suggests it is likely intended for research purposes. Commercial use compatibility would require explicit confirmation of the license.

Limitations & Caveats

The primary dependency is on the OpenAI API, which incurs costs and requires an API key. The performance gains are demonstrated on specific numerical reasoning benchmarks, and applicability to other domains may vary. The README does not detail specific Python versions or other system dependencies beyond the OpenAI key.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
3 more.

Math-Verify by huggingface

0.8%
933
Math evaluator for LLM outputs in mathematical tasks
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.