Training codebase for instruction-following language models
Top 15.9% on sourcepulse
This repository provides a comprehensive codebase for instruction-tuning and post-training large language models, targeting researchers and developers aiming to replicate and advance open-source LLM capabilities. It offers unified tools for supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning with verifiable rewards (RLVR), enabling the creation of instruction-following models.
How It Works
The project implements state-of-the-art LLM post-training techniques, including SFT, DPO, and RLVR, within a unified framework. It leverages Hugging Face's transformers
library and adapts code from established RLHF and DPO implementations. The codebase supports distributed training and integrates with libraries like FlashAttention-2 for performance, facilitating efficient experimentation with various instruction datasets and model architectures.
Quick Start & Requirements
pip install -r requirements.txt
, pip install -e .
, and python -m nltk.downloader punkt
.packaging
, and setuptools<70.0.0
. Docker installation is also supported.Highlighted Details
Maintenance & Community
The project is actively maintained by AllenAI, with recent updates in November 2024. It is associated with multiple research papers detailing its methodologies and results.
Licensing & Compatibility
The codebase is licensed under Apache 2.0. Released models have varying licenses: V1 models follow base model licenses and a custom tulu_license.txt
, while V2 models use the AI2 ImpACT license. Compatibility for commercial use depends on the specific model's license.
Limitations & Caveats
The repository is a research codebase and does not guarantee backward compatibility. Evaluation scripts are noted as unmaintained, with a recommendation to use OLMES.
13 hours ago
1 week