slime by THUDM

LLM post-training framework for RL scaling

Created 6 months ago

3,265 stars

Top 14.7% on SourcePulse

View on GitHub

9 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Jeff Hammerbacher

Cofounder of Cloudera

Pawel Garbacki

Cofounder of Fireworks AI

Philipp Moritz

Cofounder of Anyscale

and 5 more!

Project Summary

Slime is a post-training framework designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs). It targets researchers and engineers focused on RLHF and similar fine-tuning methodologies, offering high-performance training and flexible data generation capabilities.

How It Works

Slime integrates Megatron-LM for efficient, distributed training and SGLang for data generation and rollout. The architecture features a training module (Megatron) that handles parameter updates, a rollout module (SGLang + router) that generates new training data and rewards, and a data buffer acting as a bridge to manage prompts, custom data, and rollout outputs. This modular design allows for scalable RL training by decoupling data generation from the core training loop.

Quick Start & Requirements

Install: Use the provided Docker image:

docker run --rm --gpus all --ipc=host --shm-size=16g \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  -it zhuzilin/slime:latest /bin/bash

Inside the container:

git clone https://github.com/THUDM/slime.git
cd slime
pip install -e .

Prerequisites: NVIDIA GPUs (required for Docker image), CUDA (implied by Docker image), Python.
Resources: Docker image includes SGLang 0.4.7 and Megatron. shm-size=16g recommended.
Docs: Usage Documentation

Highlighted Details

Supports efficient training by connecting Megatron with SGLang.
Enables arbitrary training data generation workflows via custom interfaces.
Provides examples for GLM-4-9B, Qwen3-4B, and Qwen3-30B-A3B (MoE).
Includes tools for converting Hugging Face checkpoints to Megatron's torch_dist format and vice-versa.

Maintenance & Community

Collaborating with the SGLang community.
Contributions welcome via Issues or PRs.
Pre-commit hooks available for code style consistency.
Debugging Guide and FAQ available.

Licensing & Compatibility

License not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Checkpoint conversion from Megatron torch_dist back to Hugging Face format is currently not supported due to missing args in saved checkpoints. The project relies on specific versions of Megatron and SGLang, and users may need to manage these dependencies carefully if not using the provided Docker image.

Health Check

Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)

226

Issues (30d)

Star History

470 stars in the last 30 days