LightRFT  by opendilab

Efficient RL framework for multimodal models

Created 3 months ago
271 stars

Top 94.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LightRFT is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback) and RLVR training capabilities, supporting multiple state-of-the-art algorithms and distributed training strategies, enabling enhanced model performance and resource utilization.

How It Works

The framework integrates high-performance inference engines like vLLM and SGLang, featuring FP8 optimization for reduced latency and memory footprint. It supports a rich ecosystem of RL algorithms, including GRPO, GSPO, and DAPO, alongside flexible training strategies such as FSDP v2 and DeepSpeed ZeRO. Innovative resource collaboration techniques like "Colocate Anything" maximize GPU utilization by co-locating reward models, while comprehensive multimodal support enables native VLM training and joint optimization of vision-language tasks.

Quick Start & Requirements

  • Prerequisites: Python >= 3.12, CUDA >= 12.8, PyTorch >= 2.9.1.
  • Installation: Pre-built Docker images are available (docker pull opendilab/lightrft:v0.1.0). Standard installation involves cloning the repo and running pip install -e .. The vLLM backend can be installed with pip install ".[vllm]".
  • Setup: Building custom Docker images requires Docker and NVIDIA Container Toolkit (make dbuild).
  • Documentation: Full documentation is accessible locally after building (make docs) or via a live preview (make docs-live).

Highlighted Details

  • Inference: Integrated vLLM/SGLang with FP8 optimization for high-performance, low-latency inference.
  • Algorithms: Comprehensive suite including GRPO, GSPO, GMPO (WIP), Dr.GRPO, DAPO, REINFORCE++, CPGD, and FIRE Sampling.
  • Multimodality: Native support for Vision-Language Model (VLM) training, multimodal reward modeling, and joint vision-language alignment.
  • Distributed Training: Robust support for FSDP v2, DeepSpeed ZeRO (Stages 1-3), gradient checkpointing, and mixed precision.

Maintenance & Community

Developed in collaboration with Shanghai AI Laboratory and based on OpenRLHF. Community support is available via GitHub Issues and email (opendilab@pjlab.org.cn). Contributions are welcomed following Conventional Commits and code standards.

Licensing & Compatibility

Licensed under the permissive Apache 2.0 License, allowing for broad compatibility with commercial and closed-source applications.

Limitations & Caveats

The "Balance Anything" intelligent load balancing system is currently under development. Some algorithms, like GMPO, are marked as Work In Progress (WIP). Flash-Attention installation may require specific configurations or pre-built wheels due to CUDA compatibility. Common issues like Out-of-Memory (OOM) errors are addressed with provided troubleshooting steps.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
0
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 3 years ago
Updated 1 year ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and Alex Chen Alex Chen(Cofounder of Nexa AI).

EasyR1 by hiyouga

0.6%
5k
RL training framework for multi-modality models
Created 1 year ago
Updated 5 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Woosuk Kwon Woosuk Kwon(Coauthor of vLLM), and
15 more.

torchtitan by pytorch

0.3%
5k
PyTorch platform for generative AI model training research
Created 2 years ago
Updated 23 hours ago
Feedback? Help us improve.