Trinity-RFT by modelscope

Reinforcement fine-tuning for LLMs

Created 9 months ago

470 stars

Top 64.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Eric Zhu

Coauthor of AutoGen; Research Scientist at Microsoft Research

Project Summary

Trinity-RFT is a comprehensive framework for reinforcement fine-tuning (RFT) of large language models (LLMs), designed for flexibility, scalability, and ease of use. It caters to researchers and developers working with LLMs, offering a unified platform to explore advanced RFT paradigms and adapt models to diverse scenarios. The framework aims to streamline the RFT process, from data handling to algorithm implementation and distributed training.

How It Works

Trinity-RFT features a unified RFT core that supports various training modes, including synchronous/asynchronous, on-policy/off-policy, and online/offline learning. Rollout and training processes can operate independently and scale across different devices. A key design principle is its first-class handling of agent-environment interactions, robustly managing lagged feedback, latency, and failures, and supporting complex multi-step workflows. The data pipelines are optimized to treat rollout tasks and experiences as dynamic assets, allowing for active management like prioritization and augmentation throughout the RFT lifecycle.

Quick Start & Requirements

Installation: Recommended installation is from source (pip install -e .[dev]) after cloning the repository. Pip installation (pip install trinity-rft==0.2.1) and Docker installation are also supported.
Prerequisites: Python 3.10-3.12, CUDA 12.4-12.8, and at least 2 GPUs are required. flash-attn installation is recommended and may take a significant time to compile.
Setup: Detailed tutorials are available for various RFT modes, agentic scenarios, data functionalities, and RL algorithm development. Links to official documentation and tutorials are provided.

Highlighted Details

Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training.
Handles complex agent-environment interactions, including lagged feedback and failures.
Features optimized, dynamic data pipelines for active management of experiences.
Offers a modular, decoupled architecture with rich GUIs for low-code usage.
Implements RL algorithms such as GSPO, AsymRE, TOPR, CISPO, and RAFT.

Maintenance & Community

The project is under active development, with recent releases (v0.2.1 in August 2025) introducing features like Agentic RL and Rollout-Training scheduling. Contributions are welcomed, with guidelines for code style checks and unit tests provided. The project acknowledges its reliance on numerous open-source projects.

Licensing & Compatibility

The project is licensed under Apache-2.0, which generally permits commercial use and modification.

Limitations & Caveats

The project is noted as being under active development, with ongoing improvements and experimental features like the web interface. Users should refer to the latest documentation for the most up-to-date information on features and stability.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

37 stars in the last 30 days