LLM post-training framework for RL scaling
Top 39.2% on sourcepulse
Slime is a post-training framework designed to scale Reinforcement Learning (RL) for Large Language Models (LLMs). It targets researchers and engineers focused on RLHF and similar fine-tuning methodologies, offering high-performance training and flexible data generation capabilities.
How It Works
Slime integrates Megatron-LM for efficient, distributed training and SGLang for data generation and rollout. The architecture features a training
module (Megatron) that handles parameter updates, a rollout
module (SGLang + router) that generates new training data and rewards, and a data buffer
acting as a bridge to manage prompts, custom data, and rollout outputs. This modular design allows for scalable RL training by decoupling data generation from the core training loop.
Quick Start & Requirements
docker run --rm --gpus all --ipc=host --shm-size=16g \
--ulimit memlock=-1 --ulimit stack=67108864 \
-it zhuzilin/slime:latest /bin/bash
Inside the container:
git clone https://github.com/THUDM/slime.git
cd slime
pip install -e .
shm-size=16g
recommended.Highlighted Details
torch_dist
format and vice-versa.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Checkpoint conversion from Megatron torch_dist
back to Hugging Face format is currently not supported due to missing args
in saved checkpoints. The project relies on specific versions of Megatron and SGLang, and users may need to manage these dependencies carefully if not using the provided Docker image.
1 day ago
Inactive