Research paper code for reasoning with reinforced fine-tuning (ReFT)
Top 59.4% on sourcepulse
This repository provides code and data for ReFT (Reasoning with Reinforced Fine-Tuning), a method designed to improve the reasoning capabilities of large language models. It targets researchers and practitioners in NLP and AI who are working on enhancing LLM performance on complex tasks like mathematical reasoning. The primary benefit is a significant boost in accuracy through a novel fine-tuning approach.
How It Works
ReFT employs a reinforcement learning-based fine-tuning strategy that guides LLMs to generate more accurate reasoning chains. It contrasts with standard Supervised Fine-Tuning (SFT) by incorporating a reward signal that directly optimizes for correct reasoning steps, leading to improved performance on benchmarks like GSM8k.
Quick Start & Requirements
pip install -r requirements.txt
.bash exps/paper_exps/SFT/gsm8k.sh
).Highlighted Details
Maintenance & Community
The project is associated with the ACL 2024 paper "ReFT: Reasoning with Reinforced Fine-Tuning". No specific community channels (Discord, Slack) or active maintenance signals are provided in the README.
Licensing & Compatibility
Limitations & Caveats
The provided checkpoints are based on older models (Galactica, CodeLlama) and may not be directly compatible with the latest LLM architectures. The README implies a focus on specific datasets like GSM8k, Svamp, and MathQA, suggesting limited out-of-the-box support for other domains.
7 months ago
1 day