alignment-handbook by huggingface

Handbook for aligning language models with human/AI preferences

Created 2 years ago

5,473 stars

Top 9.1% on SourcePulse

View on GitHub

16 Experts Love This Project

Eugene Yan

AI Scientist at AWS

Drishan Arora

Cofounder and CEO of Deep Cogito

Philipp Schmid

DevRel at Google DeepMind

Vincent Weisser

Cofounder of Prime Intellect

and 12 more!

Project Summary

This repository provides a comprehensive toolkit and recipes for aligning large language models (LLMs) with human and AI preferences. It targets ML engineers and researchers seeking to replicate state-of-the-art chatbot alignment techniques like RLHF, DPO, and ORPO, offering robust, reproducible training pipelines.

How It Works

The handbook implements a four-step alignment pipeline: continued pretraining, supervised fine-tuning (SFT) for instruction following, preference alignment using methods like Direct Preference Optimization (DPO) or Odds Ratio Preference Optimisation (ORPO), and a combined SFT/ORPO stage. It supports both full model weight training with DeepSpeed ZeRO-3 and parameter-efficient fine-tuning (PEFT) via LoRA/QLoRA. This modular approach allows users to adapt LLMs to new domains, languages, or specific behavioral objectives.

Quick Start & Requirements

Install: Clone the repo, create a Conda environment (conda create -n handbook python=3.10 && conda activate handbook), install PyTorch v2.1.2 (hardware-dependent), then pip install . and pip install flash-attn --no-build-isolation. Log in via huggingface-cli login and install Git LFS.
Prerequisites: Python 3.10, PyTorch v2.1.2, Flash Attention 2.
Resources: Requires significant compute resources for training, especially for larger models.
Links: Zephyr 7B models, datasets, and demos, Technical Report

Highlighted Details

Reproducible recipes for models like Zephyr 7B, StarChat2, and SmolLM2-Instruct.
Implements advanced alignment techniques: DPO, ORPO, Constitutional AI, and KTO.
Includes a new dataset, "No Robots," with 10,000 human-annotated instructions.
Supports distributed training and parameter-efficient fine-tuning (LoRA/QLoRA).

Maintenance & Community

The project is actively maintained by Hugging Face, with contributions from prominent researchers. It has a growing ecosystem with releases of new models and recipes. Community engagement channels are available via Hugging Face's platforms.

Licensing & Compatibility

The project is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The installation of PyTorch v2.1.2 is critical for reproducibility and requires careful attention to hardware compatibility. Flash Attention 2 installation may require adjusting MAX_JOBS for systems with limited RAM.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days