Discover and explore top open-source AI tools and projects—updated daily.
opendilabEfficient RL framework for multimodal models
Top 94.9% on SourcePulse
Summary
LightRFT is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback) and RLVR training capabilities, supporting multiple state-of-the-art algorithms and distributed training strategies, enabling enhanced model performance and resource utilization.
How It Works
The framework integrates high-performance inference engines like vLLM and SGLang, featuring FP8 optimization for reduced latency and memory footprint. It supports a rich ecosystem of RL algorithms, including GRPO, GSPO, and DAPO, alongside flexible training strategies such as FSDP v2 and DeepSpeed ZeRO. Innovative resource collaboration techniques like "Colocate Anything" maximize GPU utilization by co-locating reward models, while comprehensive multimodal support enables native VLM training and joint optimization of vision-language tasks.
Quick Start & Requirements
docker pull opendilab/lightrft:v0.1.0). Standard installation involves cloning the repo and running pip install -e .. The vLLM backend can be installed with pip install ".[vllm]".make dbuild).make docs) or via a live preview (make docs-live).Highlighted Details
Maintenance & Community
Developed in collaboration with Shanghai AI Laboratory and based on OpenRLHF. Community support is available via GitHub Issues and email (opendilab@pjlab.org.cn). Contributions are welcomed following Conventional Commits and code standards.
Licensing & Compatibility
Licensed under the permissive Apache 2.0 License, allowing for broad compatibility with commercial and closed-source applications.
Limitations & Caveats
The "Balance Anything" intelligent load balancing system is currently under development. Some algorithms, like GMPO, are marked as Work In Progress (WIP). Flash-Attention installation may require specific configurations or pre-built wheels due to CUDA compatibility. Common issues like Out-of-Memory (OOM) errors are addressed with provided troubleshooting steps.
1 week ago
Inactive
mlfoundations
hiyouga
pytorch
huggingface