slime  by THUDM

LLM post-training framework for RL scaling

created 1 month ago
959 stars

Top 39.2% on sourcepulse

GitHubView on GitHub
Project Summary

Slime is a post-training framework designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs). It targets researchers and engineers focused on RLHF and similar fine-tuning methodologies, offering high-performance training and flexible data generation capabilities.

How It Works

Slime integrates Megatron-LM for efficient, distributed training and SGLang for data generation and rollout. The architecture features a training module (Megatron) that handles parameter updates, a rollout module (SGLang + router) that generates new training data and rewards, and a data buffer acting as a bridge to manage prompts, custom data, and rollout outputs. This modular design allows for scalable RL training by decoupling data generation from the core training loop.

Quick Start & Requirements

  • Install: Use the provided Docker image:
    docker run --rm --gpus all --ipc=host --shm-size=16g \
      --ulimit memlock=-1 --ulimit stack=67108864 \
      -it zhuzilin/slime:latest /bin/bash
    
    Inside the container:
    git clone https://github.com/THUDM/slime.git
    cd slime
    pip install -e .
    
  • Prerequisites: NVIDIA GPUs (required for Docker image), CUDA (implied by Docker image), Python.
  • Resources: Docker image includes SGLang 0.4.7 and Megatron. shm-size=16g recommended.
  • Docs: Usage Documentation

Highlighted Details

  • Supports efficient training by connecting Megatron with SGLang.
  • Enables arbitrary training data generation workflows via custom interfaces.
  • Provides examples for GLM-4-9B, Qwen3-4B, and Qwen3-30B-A3B (MoE).
  • Includes tools for converting Hugging Face checkpoints to Megatron's torch_dist format and vice-versa.

Maintenance & Community

  • Collaborating with the SGLang community.
  • Contributions welcome via Issues or PRs.
  • Pre-commit hooks available for code style consistency.
  • Debugging Guide and FAQ available.

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Checkpoint conversion from Megatron torch_dist back to Hugging Face format is currently not supported due to missing args in saved checkpoints. The project relies on specific versions of Megatron and SGLang, and users may need to manage these dependencies carefully if not using the provided Docker image.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
81
Issues (30d)
24
Star History
1,025 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.