mwp_ReFT  by lqtrung1998

Research paper code for reasoning with reinforced fine-tuning (ReFT)

Created 1 year ago
547 stars

Top 58.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and data for ReFT (Reasoning with Reinforced Fine-Tuning), a method designed to improve the reasoning capabilities of large language models. It targets researchers and practitioners in NLP and AI who are working on enhancing LLM performance on complex tasks like mathematical reasoning. The primary benefit is a significant boost in accuracy through a novel fine-tuning approach.

How It Works

ReFT employs a reinforcement learning-based fine-tuning strategy that guides LLMs to generate more accurate reasoning chains. It contrasts with standard Supervised Fine-Tuning (SFT) by incorporating a reward signal that directly optimizes for correct reasoning steps, leading to improved performance on benchmarks like GSM8k.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, PyTorch, Hugging Face Transformers. Specific model checkpoints (e.g., CodeLlama, Galactica) are required.
  • Running: Example commands are provided for SFT, ReFT, Online-SL, and Offline-SL training and sampling using bash scripts (e.g., bash exps/paper_exps/SFT/gsm8k.sh).
  • Resources: Requires significant computational resources for fine-tuning LLMs, including GPUs.

Highlighted Details

  • Achieves 75.28% Top-1 accuracy on GSM8k with CodeLlama-7b, a substantial improvement over SFT baselines.
  • Offers checkpoints for various stages of fine-tuning (warmup-SFT, SFT, ReFT) on CodeLlama and Galactica models.
  • Supports evaluation metrics including Top-1 accuracy, Voting@100, and Rerank@100.
  • Includes implementations for related techniques like Online-SL and Offline-SL.

Maintenance & Community

The project is associated with the ACL 2024 paper "ReFT: Reasoning with Reinforced Fine-Tuning". No specific community channels (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • License: Apache 2.0 for the code.
  • Compatibility: The underlying models (CodeLlama, Galactica) have their own licenses (Llama license, CC BY-NC 4.0) which may restrict commercial use or require adherence to non-commercial terms.

Limitations & Caveats

The provided checkpoints are based on older models (Galactica, CodeLlama) and may not be directly compatible with the latest LLM architectures. The README implies a focus on specific datasets like GSM8k, Svamp, and MathQA, suggesting limited out-of-the-box support for other domains.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

alpaca_farm by tatsu-lab

0.1%
826
RLHF simulation framework for accessible instruction-following/alignment research
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
2 more.

rome by kmeng01

0.1%
668
Model editing research paper for GPT-2 and GPT-J
Created 3 years ago
Updated 1 year ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.