slime  by THUDM

LLM post-training framework for RL scaling

Created 3 months ago
1,798 stars

Top 23.9% on SourcePulse

GitHubView on GitHub
Project Summary

Slime is a post-training framework designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs). It targets researchers and engineers focused on RLHF and similar fine-tuning methodologies, offering high-performance training and flexible data generation capabilities.

How It Works

Slime integrates Megatron-LM for efficient, distributed training and SGLang for data generation and rollout. The architecture features a training module (Megatron) that handles parameter updates, a rollout module (SGLang + router) that generates new training data and rewards, and a data buffer acting as a bridge to manage prompts, custom data, and rollout outputs. This modular design allows for scalable RL training by decoupling data generation from the core training loop.

Quick Start & Requirements

  • Install: Use the provided Docker image:
    docker run --rm --gpus all --ipc=host --shm-size=16g \
      --ulimit memlock=-1 --ulimit stack=67108864 \
      -it zhuzilin/slime:latest /bin/bash
    
    Inside the container:
    git clone https://github.com/THUDM/slime.git
    cd slime
    pip install -e .
    
  • Prerequisites: NVIDIA GPUs (required for Docker image), CUDA (implied by Docker image), Python.
  • Resources: Docker image includes SGLang 0.4.7 and Megatron. shm-size=16g recommended.
  • Docs: Usage Documentation

Highlighted Details

  • Supports efficient training by connecting Megatron with SGLang.
  • Enables arbitrary training data generation workflows via custom interfaces.
  • Provides examples for GLM-4-9B, Qwen3-4B, and Qwen3-30B-A3B (MoE).
  • Includes tools for converting Hugging Face checkpoints to Megatron's torch_dist format and vice-versa.

Maintenance & Community

  • Collaborating with the SGLang community.
  • Contributions welcome via Issues or PRs.
  • Pre-commit hooks available for code style consistency.
  • Debugging Guide and FAQ available.

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Checkpoint conversion from Megatron torch_dist back to Hugging Face format is currently not supported due to missing args in saved checkpoints. The project relies on specific versions of Megatron and SGLang, and users may need to manage these dependencies carefully if not using the provided Docker image.

Health Check
Last Commit

16 hours ago

Responsiveness

1 day

Pull Requests (30d)
111
Issues (30d)
48
Star History
430 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI).

Pai-Megatron-Patch by alibaba

0.7%
1k
Training toolkit for LLMs & VLMs using Megatron
Created 2 years ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 23 hours ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Stefan van der Walt Stefan van der Walt(Core Contributor to scientific Python ecosystem), and
12 more.

litgpt by Lightning-AI

0.1%
13k
LLM SDK for pretraining, finetuning, and deploying 20+ high-performance LLMs
Created 2 years ago
Updated 6 days ago
Feedback? Help us improve.