alignment-handbook  by huggingface

Handbook for aligning language models with human/AI preferences

Created 2 years ago
5,413 stars

Top 9.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive toolkit and recipes for aligning large language models (LLMs) with human and AI preferences. It targets ML engineers and researchers seeking to replicate state-of-the-art chatbot alignment techniques like RLHF, DPO, and ORPO, offering robust, reproducible training pipelines.

How It Works

The handbook implements a four-step alignment pipeline: continued pretraining, supervised fine-tuning (SFT) for instruction following, preference alignment using methods like Direct Preference Optimization (DPO) or Odds Ratio Preference Optimisation (ORPO), and a combined SFT/ORPO stage. It supports both full model weight training with DeepSpeed ZeRO-3 and parameter-efficient fine-tuning (PEFT) via LoRA/QLoRA. This modular approach allows users to adapt LLMs to new domains, languages, or specific behavioral objectives.

Quick Start & Requirements

  • Install: Clone the repo, create a Conda environment (conda create -n handbook python=3.10 && conda activate handbook), install PyTorch v2.1.2 (hardware-dependent), then pip install . and pip install flash-attn --no-build-isolation. Log in via huggingface-cli login and install Git LFS.
  • Prerequisites: Python 3.10, PyTorch v2.1.2, Flash Attention 2.
  • Resources: Requires significant compute resources for training, especially for larger models.
  • Links: Zephyr 7B models, datasets, and demos, Technical Report

Highlighted Details

  • Reproducible recipes for models like Zephyr 7B, StarChat2, and SmolLM2-Instruct.
  • Implements advanced alignment techniques: DPO, ORPO, Constitutional AI, and KTO.
  • Includes a new dataset, "No Robots," with 10,000 human-annotated instructions.
  • Supports distributed training and parameter-efficient fine-tuning (LoRA/QLoRA).

Maintenance & Community

The project is actively maintained by Hugging Face, with contributions from prominent researchers. It has a growing ecosystem with releases of new models and recipes. Community engagement channels are available via Hugging Face's platforms.

Licensing & Compatibility

The project is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The installation of PyTorch v2.1.2 is critical for reproducibility and requires careful attention to hardware compatibility. Flash Attention 2 installation may require adjusting MAX_JOBS for systems with limited RAM.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
0
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

direct-preference-optimization by eric-mitchell

0.1%
3k
Reference implementation for Direct Preference Optimization (DPO)
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.