buffer-of-thought-llm by YangLing0818

Research paper implementation for thought-augmented LLM reasoning

Created 1 year ago

675 stars

Top 50.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides the official implementation for "Buffer of Thoughts" (BoT), a novel framework designed to enhance Large Language Model (LLM) reasoning capabilities. It targets researchers and developers working on improving LLM accuracy, efficiency, and robustness in complex reasoning tasks, offering significant performance gains over existing methods.

How It Works

BoT introduces a "meta-buffer" to store distilled "thought-templates" from problem-solving processes. For new problems, relevant templates are retrieved and adaptively instantiated for efficient reasoning. A "buffer-manager" dynamically updates the meta-buffer, ensuring scalability and stability as more tasks are solved. This approach aims to provide superior generalization and robustness while being significantly more cost-effective than multi-query prompting methods.

Quick Start & Requirements

Install: Clone the repository, navigate into the directory, create a conda environment (conda create -n BoT python==3.9), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.9, conda, and an OpenAI API key for default inference. Support for local LLMs is mentioned but requires specific configuration.
Inference:
- For math problems: python inference.py --api_key 'YOUR_API_KEY'
- For benchmarks (gameof24, checkmate, wordsorting): python run_benchmarks.py --task_name 'benchmark_name' --api_key 'YOUR_API_KEY' --model_id 'MODEL_ID'
Links: Official Implementation, Reasonflux-F1

Highlighted Details

Achieves SOTA performance on benchmarks like Game of 24 (82.4%), Geometric Shapes (93.6%), and Checkmate-in-One (86.4%).
Demonstrates potential for smaller models (e.g., Llama3-8B + BoT) to surpass larger models (Llama3-70B).
Requires only 12% of the cost of multi-query prompting methods on average.
Released ReasonFlux-F1 models leverage BoT for SOTA reasoning capabilities.

Maintenance & Community

The project is affiliated with Peking University, UC Berkeley, and Stanford University. Recent updates include the release of ReasonFlux-F1 models and SuperCorrect, a self-correction framework. The implementation of meta-buffer and buffer manager is based on light-RAG.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The current code primarily supports online LLMs for math problems, with plans to support local models. The README does not detail specific hardware requirements beyond standard Python environments.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days