RL-Factory by Simple-Efficient

RL post-training framework for agentic learning

Created 7 months ago

1,686 stars

Top 24.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

RL-Factory is an open-source framework designed for efficient Reinforcement Learning (RL) post-training of agent models, particularly for tool-use scenarios. It targets researchers and developers aiming to simplify the process of training agents that interact with tools, offering a 2x speedup in training time and a focus on ease of use through decoupled environment and reward function definitions.

How It Works

RL-Factory decouples the environment from the RL training process, allowing users to define tool configurations and reward functions with minimal code. It supports asynchronous, parallel tool-calling for enhanced training efficiency and features multi-turn tool-calling, model-based rewards, and native support for training with Qwen3 models and MCP tools. This architecture aims to let users focus on reward logic and tool setup while enabling hardcore developers to optimize training efficiency and model performance.

Quick Start & Requirements

Install: pip3 install -e . --no-deps (after installing other dependencies).
Prerequisites: CUDA >= 12.0 (12.4 recommended), Python >= 3.10 (3.10 recommended), vllm >= 0.8.3 (0.8.5 recommended for Qwen3).
Dependencies: Includes accelerate, bitsandbytes, deepspeed, flash-attn, peft, ray, torch, transformers, vllm, qwen-agent, llama_index, faiss-gpu-cu12 (optional), nvidia-cublas-cu12 (optional).
Setup: Requires significant dependency installation.
Docs: Tutorial, Installation, Framework Design.

Highlighted Details

Achieves 2x faster training compared to Search-R1, demonstrated with Qwen3 models.
Supports Qwen3 models out-of-the-box, showing superior tool-calling capabilities without SFT.
Enables flexible reward definition via rules, model-judging, or tools.
Facilitates seamless tool integration through configuration files.
Future plans include a WebUI for data processing, configuration, and project management.

Maintenance & Community

Fast release cycle planned for new features.
Contributions are welcomed via GitHub issues or direct contact (chaijiajun@meituan.com, gjyin@outlook.com).
Benefits from verl, Search-R1, and Qwen-Agent.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, only Qwen models have been tested. The framework is under rapid development, with a WebUI and broader model support planned for future releases. Optional dependencies like faiss-gpu-cu12 may be needed to resolve specific training issues.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

43 stars in the last 30 days