Program-of-Thoughts by TIGER-AI-Lab

Research paper code for numerical reasoning via Program of Thoughts

Created 3 years ago

303 stars

Top 88.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Yiran Wu

Coauthor of AutoGen

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

This repository provides the data and code for the "Program of Thoughts" (PoT) prompting technique, designed to improve numerical reasoning in Large Language Models (LLMs). PoT disentangles computation from reasoning by having the LLM generate Python code for calculations, which is then executed by an external interpreter, achieving state-of-the-art results on various math word problem benchmarks.

How It Works

PoT prompts LLMs to express their reasoning steps as executable Python code. This approach separates the symbolic reasoning process from the numerical computation. The LLM generates a sequence of Python statements that represent its thought process, and these statements are then executed by a standard Python interpreter. This allows the LLM to leverage the accuracy and robustness of a dedicated computation engine, leading to improved performance on tasks requiring precise calculations.

Quick Start & Requirements

Install/Run: Set OPENAI_KEY environment variable. Run scripts like python run_gsm8k.py for greedy decoding or python run_gsm8k.py (without --greedy) for self-consistency.
Prerequisites: OpenAI API key.
Evaluation: Navigate to the outputs/ directory and run python compute_score.py --inputs <your_prediction_file.jsonl>.
Links: TMLR 2023 Paper (implied by description).

Highlighted Details

Outperforms few-shot Chain-of-Thought (CoT) by an average of 12% on evaluated datasets.
Achieves state-of-the-art performance with self-consistency decoding on GSM8K, AQuA, SVAMP, TabMWP, and MultiArith.
Includes evaluation scripts and detailed results for multiple datasets (GSM8K, AQuA, SVAMP, TabMWP, FinQA, ConvFinQA, TATQA, MultiArith).
Supports both greedy decoding and self-consistency decoding methods.

Maintenance & Community

The project is associated with the TMLR 2023 paper "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" by Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. No specific community channels or active maintenance indicators are present in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the nature of the code and its association with academic research suggests it is likely intended for research purposes. Commercial use compatibility would require explicit confirmation of the license.

Limitations & Caveats

The primary dependency is on the OpenAI API, which incurs costs and requires an API key. The performance gains are demonstrated on specific numerical reasoning benchmarks, and applicability to other domains may vary. The README does not detail specific Python versions or other system dependencies beyond the OpenAI key.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days