Research paper comparing SFT and RL for foundation model post-training
Top 92.3% on sourcepulse
This repository provides the official implementation for the paper "SFT Memorizes, RL Generalizes," comparing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for post-training foundation models. It targets researchers and engineers working on LLM alignment and generalization, offering tools to reproduce study findings and evaluate API-based models.
How It Works
The project implements two primary post-training paradigms: SFT and RL (specifically PPO). It leverages Llama-3.2-Vision-Instruct as the base model, initializing RL experiments with SFT-tuned checkpoints to ensure baseline instruction-following. The codebase includes custom gym environments for evaluation and specific scripts for training and evaluating both "GeneralPoints" and "V-IRL" tasks, supporting both language-only and vision-language modalities.
Quick Start & Requirements
conda create -n SFTvsRL python==3.13
), activate, pip install -r requirements.txt
, cd gym && pip install -e .
.huggingface-cli download
.Highlighted Details
Maintenance & Community
The project acknowledges contributions from RL4VLM, Llama-3.2-Vision-Instruct, Llama-3.2-Vision-Finetune, and V-IRL: Grounding Virtual Intelligence in Real Life. No specific community links (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Reproducing training experiments requires a high-end compute setup (8x 80GB GPUs). The project is based on Llama-3.2-Vision-Instruct, and performance with other models may vary. Some components are still being updated.
3 months ago
Inactive