DFT  by yongliang-wu

Improving SFT generalization with reward rectification

Created 2 months ago
467 stars

Top 65.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces Dynamic Fine-Tuning (DFT), a method to improve the generalization of Supervised Fine-Tuning (SFT) for Large Language Models (LLMs). It addresses the limitations of standard SFT by proposing a theoretically motivated reward rectification technique, offering a simpler yet effective alternative to reinforcement learning for certain tasks. The target audience includes LLM researchers and practitioners seeking to enhance SFT performance.

How It Works

DFT modifies the SFT objective by dynamically rescaling each token's loss by its predicted probability. This "reward rectification" stabilizes gradient updates, preventing the implicit problematic reward structure that can hinder generalization in standard SFT. The approach is implemented as a single-line code change, making it easy to integrate into existing SFT pipelines.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using conda and pip. The setup involves creating a Conda environment and installing vllm, sglang, and mcore.
  • Prerequisites: Python 3.10.0, PyTorch 2.6.0+cu124, and H100 servers are recommended.
  • Getting Started: Includes scripts for data preparation, launching training with torchrun, and evaluation. Links to Qwen2.5-Math repository for evaluation setup are provided.

Highlighted Details

  • Significantly outperforms standard SFT across challenging benchmarks and base models.
  • Demonstrates improved generalization capabilities.
  • Shows competitive results in offline RL settings.
  • Integrates with ms-swift and has community reproductions.

Maintenance & Community

The project is associated with the volcengine/verl framework. Community reproductions and integration with ms-swift suggest active interest.

Licensing & Compatibility

The repository does not explicitly state a license. The associated volcengine/verl repository is Apache 2.0 licensed, but this specific project's licensing requires clarification for commercial use.

Limitations & Caveats

DFT performs best on tasks with non-deterministic solution trajectories (e.g., mathematical CoT, complex coding). Its performance is weaker on tasks with single, near-deterministic ground-truth answers and constrained CoT.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
20 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.