RL-Factory  by Simple-Efficient

RL post-training framework for agentic learning

created 2 months ago
1,315 stars

Top 31.1% on sourcepulse

GitHubView on GitHub
Project Summary

RL-Factory is an open-source framework designed for efficient Reinforcement Learning (RL) post-training of agent models, particularly for tool-use scenarios. It targets researchers and developers aiming to simplify the process of training agents that interact with tools, offering a 2x speedup in training time and a focus on ease of use through decoupled environment and reward function definitions.

How It Works

RL-Factory decouples the environment from the RL training process, allowing users to define tool configurations and reward functions with minimal code. It supports asynchronous, parallel tool-calling for enhanced training efficiency and features multi-turn tool-calling, model-based rewards, and native support for training with Qwen3 models and MCP tools. This architecture aims to let users focus on reward logic and tool setup while enabling hardcore developers to optimize training efficiency and model performance.

Quick Start & Requirements

  • Install: pip3 install -e . --no-deps (after installing other dependencies).
  • Prerequisites: CUDA >= 12.0 (12.4 recommended), Python >= 3.10 (3.10 recommended), vllm >= 0.8.3 (0.8.5 recommended for Qwen3).
  • Dependencies: Includes accelerate, bitsandbytes, deepspeed, flash-attn, peft, ray, torch, transformers, vllm, qwen-agent, llama_index, faiss-gpu-cu12 (optional), nvidia-cublas-cu12 (optional).
  • Setup: Requires significant dependency installation.
  • Docs: Tutorial, Installation, Framework Design.

Highlighted Details

  • Achieves 2x faster training compared to Search-R1, demonstrated with Qwen3 models.
  • Supports Qwen3 models out-of-the-box, showing superior tool-calling capabilities without SFT.
  • Enables flexible reward definition via rules, model-judging, or tools.
  • Facilitates seamless tool integration through configuration files.
  • Future plans include a WebUI for data processing, configuration, and project management.

Maintenance & Community

  • Fast release cycle planned for new features.
  • Contributions are welcomed via GitHub issues or direct contact (chaijiajun@meituan.com, gjyin@outlook.com).
  • Benefits from verl, Search-R1, and Qwen-Agent.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, only Qwen models have been tested. The framework is under rapid development, with a WebUI and broader model support planned for future releases. Optional dependencies like faiss-gpu-cu12 may be needed to resolve specific training issues.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
4
Star History
1,343 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 20 hours ago
Feedback? Help us improve.