torchtune by meta-pytorch

PyTorch library for LLM post-training and experimentation

Created 2 years ago

5,642 stars

Top 8.9% on SourcePulse

View on GitHub

17 Experts Love This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Lewis Tunstall

Research Engineer at Hugging Face

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Junyang Lin

Core Maintainer at Alibaba Qwen

and 13 more!

Project Summary

torchtune is a PyTorch-native library for post-training LLMs, offering hackable recipes for SFT, KD, RLHF, and QAT. It supports popular models like Llama, Gemma, and Mistral, prioritizing memory efficiency and performance through YAML configurations and integration with PyTorch's latest APIs. This library is ideal for researchers and engineers looking to fine-tune and experiment with LLMs efficiently.

How It Works

torchtune employs a modular, recipe-driven approach, allowing users to configure training, evaluation, quantization, or inference via YAML files. It leverages PyTorch's advanced features like FSDP2 for distributed training, torchao for quantization, and torch.compile for performance gains. The library emphasizes memory efficiency through techniques like activation offloading, packed datasets, and fused optimizers, enabling larger models and batch sizes on limited hardware.

Quick Start & Requirements

Install: pip install torchtune (stable) or pip install --pre --upgrade torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu (nightly).
Prerequisites: PyTorch (latest stable or nightly), torchvision, torchao. CUDA 12.x recommended for GPU acceleration. Hugging Face Hub token required for downloading model weights.
Get Started: First Finetune Tutorial, End-to-End Workflow Tutorial.
CLI: tune --help to list commands.

Highlighted Details

Supports a wide range of LLMs including Llama 4, Llama 3.3/3.2/3.1, Gemma 2, Mistral, Phi, and Qwen.
Offers comprehensive post-training methods: SFT, Knowledge Distillation, DPO, PPO, GRPO, and QAT.
Demonstrates significant memory and speed improvements via optimization flags (e.g., QLoRA on Llama 3.1 405B uses 44.8GB on 8x A100).
Integrates with ecosystem tools like Hugging Face Hub, LM Eval Harness, Weights & Biases, and ExecuTorch.

Maintenance & Community

Actively developed with recent updates adding support for Llama 4, Llama 3.3/3.2, Gemma 2, and multi-node training.
Community contributions are highlighted, including PPO, Qwen2, Gemma 2, and DPO implementations.
Integrates with Hugging Face, EleutherAI, and Weights & Biases.

Licensing & Compatibility

Released under the BSD 3-Clause license.
Compatible with commercial use, but users must adhere to terms of service for third-party models.

Limitations & Caveats

Knowledge Distillation is not supported for full weight updates across multiple devices or nodes. PPO and GRPO have limited multi-device/node support for full weight updates. QAT is not supported on single devices.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

37 stars in the last 30 days