LightRFT by opendilab

Efficient RL framework for multimodal models

Created 6 months ago

402 stars

Top 71.7% on SourcePulse

Project Summary

Summary

LightRFT is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback) and RLVR training capabilities, supporting multiple state-of-the-art algorithms and distributed training strategies, enabling enhanced model performance and resource utilization.

How It Works

The framework integrates high-performance inference engines like vLLM and SGLang, featuring FP8 optimization for reduced latency and memory footprint. It supports a rich ecosystem of RL algorithms, including GRPO, GSPO, and DAPO, alongside flexible training strategies such as FSDP v2 and DeepSpeed ZeRO. Innovative resource collaboration techniques like "Colocate Anything" maximize GPU utilization by co-locating reward models, while comprehensive multimodal support enables native VLM training and joint optimization of vision-language tasks.

Quick Start & Requirements

Prerequisites: Python >= 3.12, CUDA >= 12.8, PyTorch >= 2.9.1.
Installation: Pre-built Docker images are available (docker pull opendilab/lightrft:v0.1.0). Standard installation involves cloning the repo and running pip install -e .. The vLLM backend can be installed with pip install ".[vllm]".
Setup: Building custom Docker images requires Docker and NVIDIA Container Toolkit (make dbuild).
Documentation: Full documentation is accessible locally after building (make docs) or via a live preview (make docs-live).

Highlighted Details

Inference: Integrated vLLM/SGLang with FP8 optimization for high-performance, low-latency inference.
Algorithms: Comprehensive suite including GRPO, GSPO, GMPO (WIP), Dr.GRPO, DAPO, REINFORCE++, CPGD, and FIRE Sampling.
Multimodality: Native support for Vision-Language Model (VLM) training, multimodal reward modeling, and joint vision-language alignment.
Distributed Training: Robust support for FSDP v2, DeepSpeed ZeRO (Stages 1-3), gradient checkpointing, and mixed precision.

Maintenance & Community

Developed in collaboration with Shanghai AI Laboratory and based on OpenRLHF. Community support is available via GitHub Issues and email (opendilab@pjlab.org.cn). Contributions are welcomed following Conventional Commits and code standards.

Licensing & Compatibility

Licensed under the permissive Apache 2.0 License, allowing for broad compatibility with commercial and closed-source applications.

Limitations & Caveats

The "Balance Anything" intelligent load balancing system is currently under development. Some algorithms, like GMPO, are marked as Work In Progress (WIP). Flash-Attention installation may require specific configurations or pre-built wheels due to CUDA compatibility. Common issues like Out-of-Memory (OOM) errors are addressed with provided troubleshooting steps.

LightRFT by opendilab

Explore Similar Projects

cobra by OpenHelix-Team

vla-scratch by EGalahad

vla_foundry by TRI-ML

LoongForge by baidu-baige

ArchScale by microsoft

Relax by redai-infra

cosmos-framework by NVIDIA

molmo by allenai

TencentPretrain by Tencent

Qwen3.6 by QwenLM

EasyR1 by hiyouga

transformers by huggingface