torchtune  by pytorch

PyTorch library for LLM post-training and experimentation

created 1 year ago
5,368 stars

Top 9.5% on sourcepulse

GitHubView on GitHub
Project Summary

torchtune is a PyTorch-native library for post-training LLMs, offering hackable recipes for SFT, KD, RLHF, and QAT. It supports popular models like Llama, Gemma, and Mistral, prioritizing memory efficiency and performance through YAML configurations and integration with PyTorch's latest APIs. This library is ideal for researchers and engineers looking to fine-tune and experiment with LLMs efficiently.

How It Works

torchtune employs a modular, recipe-driven approach, allowing users to configure training, evaluation, quantization, or inference via YAML files. It leverages PyTorch's advanced features like FSDP2 for distributed training, torchao for quantization, and torch.compile for performance gains. The library emphasizes memory efficiency through techniques like activation offloading, packed datasets, and fused optimizers, enabling larger models and batch sizes on limited hardware.

Quick Start & Requirements

  • Install: pip install torchtune (stable) or pip install --pre --upgrade torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu (nightly).
  • Prerequisites: PyTorch (latest stable or nightly), torchvision, torchao. CUDA 12.x recommended for GPU acceleration. Hugging Face Hub token required for downloading model weights.
  • Get Started: First Finetune Tutorial, End-to-End Workflow Tutorial.
  • CLI: tune --help to list commands.

Highlighted Details

  • Supports a wide range of LLMs including Llama 4, Llama 3.3/3.2/3.1, Gemma 2, Mistral, Phi, and Qwen.
  • Offers comprehensive post-training methods: SFT, Knowledge Distillation, DPO, PPO, GRPO, and QAT.
  • Demonstrates significant memory and speed improvements via optimization flags (e.g., QLoRA on Llama 3.1 405B uses 44.8GB on 8x A100).
  • Integrates with ecosystem tools like Hugging Face Hub, LM Eval Harness, Weights & Biases, and ExecuTorch.

Maintenance & Community

  • Actively developed with recent updates adding support for Llama 4, Llama 3.3/3.2, Gemma 2, and multi-node training.
  • Community contributions are highlighted, including PPO, Qwen2, Gemma 2, and DPO implementations.
  • Integrates with Hugging Face, EleutherAI, and Weights & Biases.

Licensing & Compatibility

  • Released under the BSD 3-Clause license.
  • Compatible with commercial use, but users must adhere to terms of service for third-party models.

Limitations & Caveats

Knowledge Distillation is not supported for full weight updates across multiple devices or nodes. PPO and GRPO have limited multi-device/node support for full weight updates. QAT is not supported on single devices.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
27
Issues (30d)
13
Star History
250 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Stefan van der Walt Stefan van der Walt(Core Contributor to scientific Python ecosystem), and
8 more.

litgpt by Lightning-AI

0.2%
13k
LLM SDK for pretraining, finetuning, and deploying 20+ high-performance LLMs
created 2 years ago
updated 1 week ago
Feedback? Help us improve.