LightRFT  by opendilab

Efficient RL framework for multimodal models

Created 5 months ago
318 stars

Top 84.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LightRFT is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback) and RLVR training capabilities, supporting multiple state-of-the-art algorithms and distributed training strategies, enabling enhanced model performance and resource utilization.

How It Works

The framework integrates high-performance inference engines like vLLM and SGLang, featuring FP8 optimization for reduced latency and memory footprint. It supports a rich ecosystem of RL algorithms, including GRPO, GSPO, and DAPO, alongside flexible training strategies such as FSDP v2 and DeepSpeed ZeRO. Innovative resource collaboration techniques like "Colocate Anything" maximize GPU utilization by co-locating reward models, while comprehensive multimodal support enables native VLM training and joint optimization of vision-language tasks.

Quick Start & Requirements

  • Prerequisites: Python >= 3.12, CUDA >= 12.8, PyTorch >= 2.9.1.
  • Installation: Pre-built Docker images are available (docker pull opendilab/lightrft:v0.1.0). Standard installation involves cloning the repo and running pip install -e .. The vLLM backend can be installed with pip install ".[vllm]".
  • Setup: Building custom Docker images requires Docker and NVIDIA Container Toolkit (make dbuild).
  • Documentation: Full documentation is accessible locally after building (make docs) or via a live preview (make docs-live).

Highlighted Details

  • Inference: Integrated vLLM/SGLang with FP8 optimization for high-performance, low-latency inference.
  • Algorithms: Comprehensive suite including GRPO, GSPO, GMPO (WIP), Dr.GRPO, DAPO, REINFORCE++, CPGD, and FIRE Sampling.
  • Multimodality: Native support for Vision-Language Model (VLM) training, multimodal reward modeling, and joint vision-language alignment.
  • Distributed Training: Robust support for FSDP v2, DeepSpeed ZeRO (Stages 1-3), gradient checkpointing, and mixed precision.

Maintenance & Community

Developed in collaboration with Shanghai AI Laboratory and based on OpenRLHF. Community support is available via GitHub Issues and email (opendilab@pjlab.org.cn). Contributions are welcomed following Conventional Commits and code standards.

Licensing & Compatibility

Licensed under the permissive Apache 2.0 License, allowing for broad compatibility with commercial and closed-source applications.

Limitations & Caveats

The "Balance Anything" intelligent load balancing system is currently under development. Some algorithms, like GMPO, are marked as Work In Progress (WIP). Flash-Attention installation may require specific configurations or pre-built wheels due to CUDA compatibility. Common issues like Out-of-Memory (OOM) errors are addressed with provided troubleshooting steps.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.