DFT  by yongliang-wu

Improving SFT generalization with reward rectification

created 2 weeks ago

New!

292 stars

Top 90.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces Dynamic Fine-Tuning (DFT), a method to improve the generalization of Supervised Fine-Tuning (SFT) for Large Language Models (LLMs). It addresses the limitations of standard SFT by proposing a theoretically motivated reward rectification technique, offering a simpler yet effective alternative to reinforcement learning for certain tasks. The target audience includes LLM researchers and practitioners seeking to enhance SFT performance.

How It Works

DFT modifies the SFT objective by dynamically rescaling each token's loss by its predicted probability. This "reward rectification" stabilizes gradient updates, preventing the implicit problematic reward structure that can hinder generalization in standard SFT. The approach is implemented as a single-line code change, making it easy to integrate into existing SFT pipelines.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using conda and pip. The setup involves creating a Conda environment and installing vllm, sglang, and mcore.
  • Prerequisites: Python 3.10.0, PyTorch 2.6.0+cu124, and H100 servers are recommended.
  • Getting Started: Includes scripts for data preparation, launching training with torchrun, and evaluation. Links to Qwen2.5-Math repository for evaluation setup are provided.

Highlighted Details

  • Significantly outperforms standard SFT across challenging benchmarks and base models.
  • Demonstrates improved generalization capabilities.
  • Shows competitive results in offline RL settings.
  • Integrates with ms-swift and has community reproductions.

Maintenance & Community

The project is associated with the volcengine/verl framework. Community reproductions and integration with ms-swift suggest active interest.

Licensing & Compatibility

The repository does not explicitly state a license. The associated volcengine/verl repository is Apache 2.0 licensed, but this specific project's licensing requires clarification for commercial use.

Limitations & Caveats

DFT performs best on tasks with non-deterministic solution trajectories (e.g., mathematical CoT, complex coding). Its performance is weaker on tasks with single, near-deterministic ground-truth answers and constrained CoT.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
14
Star History
292 stars in the last 15 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Shizhe Diao Shizhe Diao(Research Scientist at NVIDIA; Author of LMFlow), and
12 more.

gpt-3 by openai

0.0%
16k
Research paper on large language model few-shot learning
created 5 years ago
updated 4 years ago
Feedback? Help us improve.