open-instruct by allenai

Training codebase for instruction-following language models

Created 2 years ago

3,519 stars

Top 13.8% on SourcePulse

View on GitHub

15 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Vincent Weisser

Cofounder of Prime Intellect

Ross Taylor

Cofounder of General Reasoning; Cocreator of Papers with Code

and 11 more!

Project Summary

This repository provides a comprehensive codebase for instruction-tuning and post-training large language models, targeting researchers and developers aiming to replicate and advance open-source LLM capabilities. It offers unified tools for supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning with verifiable rewards (RLVR), enabling the creation of instruction-following models.

How It Works

The project implements state-of-the-art LLM post-training techniques, including SFT, DPO, and RLVR, within a unified framework. It leverages Hugging Face's transformers library and adapts code from established RLHF and DPO implementations. The codebase supports distributed training and integrates with libraries like FlashAttention-2 for performance, facilitating efficient experimentation with various instruction datasets and model architectures.

Quick Start & Requirements

Installation: Local installation via pip install -r requirements.txt, pip install -e ., and python -m nltk.downloader punkt.
Prerequisites: PyTorch 2.5.1 with CUDA 12.1, FlashAttention-2 (v2.7.2.post1), packaging, and setuptools<70.0.0. Docker installation is also supported.
Resources: Requires significant GPU resources for training, with example scripts for 1-GPU and 8-GPU setups.
Links: TÜLU 3 README, Tulu 1 & 2 READMEs, Models on HuggingFace

Highlighted Details

Supports fine-tuning and evaluation of Llama 3.1 and OLMo 2 models.
Implements DPO and RLVR for preference learning and reward-based training.
Includes scripts for measuring dataset contamination.
Offers support for LoRA and QLoRA fine-tuning.

Maintenance & Community

The project is actively maintained by AllenAI, with recent updates in November 2024. It is associated with multiple research papers detailing its methodologies and results.

Licensing & Compatibility

The codebase is licensed under Apache 2.0. Released models have varying licenses: V1 models follow base model licenses and a custom tulu_license.txt, while V2 models use the AI2 ImpACT license. Compatibility for commercial use depends on the specific model's license.

Limitations & Caveats

The repository is a research codebase and does not guarantee backward compatibility. Evaluation scripts are noted as unmaintained, with a recommendation to use OLMES.

Health Check

Last Commit

20 hours ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

105 stars in the last 30 days