Research paper code for numerical reasoning via Program of Thoughts
Top 93.9% on sourcepulse
This repository provides the data and code for the "Program of Thoughts" (PoT) prompting technique, designed to improve numerical reasoning in Large Language Models (LLMs). PoT disentangles computation from reasoning by having the LLM generate Python code for calculations, which is then executed by an external interpreter, achieving state-of-the-art results on various math word problem benchmarks.
How It Works
PoT prompts LLMs to express their reasoning steps as executable Python code. This approach separates the symbolic reasoning process from the numerical computation. The LLM generates a sequence of Python statements that represent its thought process, and these statements are then executed by a standard Python interpreter. This allows the LLM to leverage the accuracy and robustness of a dedicated computation engine, leading to improved performance on tasks requiring precise calculations.
Quick Start & Requirements
OPENAI_KEY
environment variable. Run scripts like python run_gsm8k.py
for greedy decoding or python run_gsm8k.py
(without --greedy
) for self-consistency.outputs/
directory and run python compute_score.py --inputs <your_prediction_file.jsonl>
.Highlighted Details
Maintenance & Community
The project is associated with the TMLR 2023 paper "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" by Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W. Cohen. No specific community channels or active maintenance indicators are present in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, the nature of the code and its association with academic research suggests it is likely intended for research purposes. Commercial use compatibility would require explicit confirmation of the license.
Limitations & Caveats
The primary dependency is on the OpenAI API, which incurs costs and requires an API key. The performance gains are demonstrated on specific numerical reasoning benchmarks, and applicability to other domains may vary. The README does not detail specific Python versions or other system dependencies beyond the OpenAI key.
1 year ago
1 day