mwp_ReFT by lqtrung1998

Research paper code for reasoning with reinforced fine-tuning (ReFT)

Created 2 years ago

552 stars

Top 57.9% on SourcePulse

Project Summary

This repository provides code and data for ReFT (Reasoning with Reinforced Fine-Tuning), a method designed to improve the reasoning capabilities of large language models. It targets researchers and practitioners in NLP and AI who are working on enhancing LLM performance on complex tasks like mathematical reasoning. The primary benefit is a significant boost in accuracy through a novel fine-tuning approach.

How It Works

ReFT employs a reinforcement learning-based fine-tuning strategy that guides LLMs to generate more accurate reasoning chains. It contrasts with standard Supervised Fine-Tuning (SFT) by incorporating a reward signal that directly optimizes for correct reasoning steps, leading to improved performance on benchmarks like GSM8k.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.8+, PyTorch, Hugging Face Transformers. Specific model checkpoints (e.g., CodeLlama, Galactica) are required.
Running: Example commands are provided for SFT, ReFT, Online-SL, and Offline-SL training and sampling using bash scripts (e.g., bash exps/paper_exps/SFT/gsm8k.sh).
Resources: Requires significant computational resources for fine-tuning LLMs, including GPUs.

Highlighted Details

Achieves 75.28% Top-1 accuracy on GSM8k with CodeLlama-7b, a substantial improvement over SFT baselines.
Offers checkpoints for various stages of fine-tuning (warmup-SFT, SFT, ReFT) on CodeLlama and Galactica models.
Supports evaluation metrics including Top-1 accuracy, Voting@100, and Rerank@100.
Includes implementations for related techniques like Online-SL and Offline-SL.

Maintenance & Community

The project is associated with the ACL 2024 paper "ReFT: Reasoning with Reinforced Fine-Tuning". No specific community channels (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

License: Apache 2.0 for the code.
Compatibility: The underlying models (CodeLlama, Galactica) have their own licenses (Llama license, CC BY-NC 4.0) which may restrict commercial use or require adherence to non-commercial terms.

Limitations & Caveats

The provided checkpoints are based on older models (Galactica, CodeLlama) and may not be directly compatible with the latest LLM architectures. The README implies a focus on specific datasets like GSM8k, Svamp, and MathQA, suggesting limited out-of-the-box support for other domains.

mwp_ReFT by lqtrung1998

Explore Similar Projects

neural-cherche by raphaelsty

DFT by yongliang-wu

marc by ekinakyurek

emnlp22-re3-story-generation by yangkevin2

naturalcc by CGCL-codes

alpaca_farm by tatsu-lab

ReProver by lean-dojo

Awesome-Code-LLM by huybery

rank_llm by castorini

chain-of-thought-hub by FranxYao

rome by kmeng01

DeepSeek-R1 by deepseek-ai