Discover and explore top open-source AI tools and projects—updated daily.
Reinforcement fine-tuning for LLMs
Top 81.0% on SourcePulse
Trinity-RFT is a comprehensive framework for reinforcement fine-tuning (RFT) of large language models (LLMs), designed for flexibility, scalability, and ease of use. It caters to researchers and developers working with LLMs, offering a unified platform to explore advanced RFT paradigms and adapt models to diverse scenarios. The framework aims to streamline the RFT process, from data handling to algorithm implementation and distributed training.
How It Works
Trinity-RFT features a unified RFT core that supports various training modes, including synchronous/asynchronous, on-policy/off-policy, and online/offline learning. Rollout and training processes can operate independently and scale across different devices. A key design principle is its first-class handling of agent-environment interactions, robustly managing lagged feedback, latency, and failures, and supporting complex multi-step workflows. The data pipelines are optimized to treat rollout tasks and experiences as dynamic assets, allowing for active management like prioritization and augmentation throughout the RFT lifecycle.
Quick Start & Requirements
pip install -e .[dev]
) after cloning the repository. Pip installation (pip install trinity-rft==0.2.1
) and Docker installation are also supported.flash-attn
installation is recommended and may take a significant time to compile.Highlighted Details
Maintenance & Community
The project is under active development, with recent releases (v0.2.1 in August 2025) introducing features like Agentic RL and Rollout-Training scheduling. Contributions are welcomed, with guidelines for code style checks and unit tests provided. The project acknowledges its reliance on numerous open-source projects.
Licensing & Compatibility
The project is licensed under Apache-2.0, which generally permits commercial use and modification.
Limitations & Caveats
The project is noted as being under active development, with ongoing improvements and experimental features like the web interface. Users should refer to the latest documentation for the most up-to-date information on features and stability.
23 hours ago
Inactive