mwp_ReFT  by lqtrung1998

Research paper code for reasoning with reinforced fine-tuning (ReFT)

created 1 year ago
544 stars

Top 59.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and data for ReFT (Reasoning with Reinforced Fine-Tuning), a method designed to improve the reasoning capabilities of large language models. It targets researchers and practitioners in NLP and AI who are working on enhancing LLM performance on complex tasks like mathematical reasoning. The primary benefit is a significant boost in accuracy through a novel fine-tuning approach.

How It Works

ReFT employs a reinforcement learning-based fine-tuning strategy that guides LLMs to generate more accurate reasoning chains. It contrasts with standard Supervised Fine-Tuning (SFT) by incorporating a reward signal that directly optimizes for correct reasoning steps, leading to improved performance on benchmarks like GSM8k.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, PyTorch, Hugging Face Transformers. Specific model checkpoints (e.g., CodeLlama, Galactica) are required.
  • Running: Example commands are provided for SFT, ReFT, Online-SL, and Offline-SL training and sampling using bash scripts (e.g., bash exps/paper_exps/SFT/gsm8k.sh).
  • Resources: Requires significant computational resources for fine-tuning LLMs, including GPUs.

Highlighted Details

  • Achieves 75.28% Top-1 accuracy on GSM8k with CodeLlama-7b, a substantial improvement over SFT baselines.
  • Offers checkpoints for various stages of fine-tuning (warmup-SFT, SFT, ReFT) on CodeLlama and Galactica models.
  • Supports evaluation metrics including Top-1 accuracy, Voting@100, and Rerank@100.
  • Includes implementations for related techniques like Online-SL and Offline-SL.

Maintenance & Community

The project is associated with the ACL 2024 paper "ReFT: Reasoning with Reinforced Fine-Tuning". No specific community channels (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • License: Apache 2.0 for the code.
  • Compatibility: The underlying models (CodeLlama, Galactica) have their own licenses (Llama license, CC BY-NC 4.0) which may restrict commercial use or require adherence to non-commercial terms.

Limitations & Caveats

The provided checkpoints are based on older models (Galactica, CodeLlama) and may not be directly compatible with the latest LLM architectures. The README implies a focus on specific datasets like GSM8k, Svamp, and MathQA, suggesting limited out-of-the-box support for other domains.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.