SFTvsRL  by LeslieTrue

Research paper comparing SFT and RL for foundation model post-training

Created 7 months ago
292 stars

Top 90.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for the paper "SFT Memorizes, RL Generalizes," comparing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for post-training foundation models. It targets researchers and engineers working on LLM alignment and generalization, offering tools to reproduce study findings and evaluate API-based models.

How It Works

The project implements two primary post-training paradigms: SFT and RL (specifically PPO). It leverages Llama-3.2-Vision-Instruct as the base model, initializing RL experiments with SFT-tuned checkpoints to ensure baseline instruction-following. The codebase includes custom gym environments for evaluation and specific scripts for training and evaluating both "GeneralPoints" and "V-IRL" tasks, supporting both language-only and vision-language modalities.

Quick Start & Requirements

  • Install: Clone repo, create conda env (conda create -n SFTvsRL python==3.13), activate, pip install -r requirements.txt, cd gym && pip install -e ..
  • Prerequisites: Python 3.13, PyTorch 2.5.1+cu124, H800 servers (or equivalent 8x 80GB GPU nodes for training).
  • Checkpoints: Optional download of SFT-initialized checkpoints via huggingface-cli download.
  • Data: V-IRL requires downloading specific datasets and updating paths in shell scripts.
  • Docs: Paper

Highlighted Details

  • Comparative study of SFT vs. RL post-training methodologies.
  • Supports both language-only and vision-language tasks.
  • Includes custom gym environments for evaluation.
  • Scripts are compatible with SLURM clusters.

Maintenance & Community

The project acknowledges contributions from RL4VLM, Llama-3.2-Vision-Instruct, Llama-3.2-Vision-Finetune, and V-IRL: Grounding Virtual Intelligence in Real Life. No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Reproducing training experiments requires a high-end compute setup (8x 80GB GPUs). The project is based on Llama-3.2-Vision-Instruct, and performance with other models may vary. Some components are still being updated.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

alpaca_farm by tatsu-lab

0.1%
826
RLHF simulation framework for accessible instruction-following/alignment research
Created 2 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
14 more.

verifiers by willccbb

3.1%
3k
RL for LLMs in verifiable environments
Created 7 months ago
Updated 1 day ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SkyThought by NovaSky-AI

0.1%
3k
Training recipes for Sky-T1 family of models
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.