Liger-Kernel by linkedin

Triton kernels for efficient LLM training

Created 1 year ago

6,026 stars

Top 8.4% on SourcePulse

View on GitHub

14 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Pawel Garbacki

Cofounder of Fireworks AI

Jiayi Pan

Author of SWE-Gym; MTS at xAI

and 10 more!

Project Summary

Liger Kernel provides a suite of optimized Triton kernels designed to significantly enhance the efficiency of Large Language Model (LLM) training. Targeting researchers and engineers working with LLMs, it offers substantial improvements in training throughput and memory usage, enabling larger models and longer context lengths.

How It Works

Liger Kernel leverages Triton's capabilities for low-level GPU programming to fuse common LLM operations like RMSNorm, RoPE, SwiGLU, and various loss functions. This fusion, combined with techniques like in-place computation and chunking, reduces memory bandwidth requirements and computational overhead. The kernels are designed for exact computation, ensuring no loss of accuracy compared to standard implementations.

Quick Start & Requirements

Installation: pip install liger-kernel (stable) or pip install liger-kernel-nightly (nightly). Install from source via git clone and pip install -e ..
Prerequisites: CUDA (>= 2.1.2) with Triton (>= 2.3.0) for NVIDIA, or ROCm (>= 2.5.0) with Triton (>= 3.0.0) for AMD. transformers (>= 4.x) is required for patching APIs.
Setup: Minimal dependencies, primarily Torch and Triton.
Resources: Supports multi-GPU setups (FSDP, DeepSpeed, DDP).
Documentation: Getting Started, Examples, High-level APIs, Low-level APIs.

Highlighted Details

Up to 20% throughput increase and 60% memory reduction for LLM training layers.
Up to 80% memory savings for post-training alignment and distillation tasks (DPO, ORPO, CPO, etc.).
Full AMD ROCm support alongside NVIDIA CUDA.
One-line patching for Hugging Face models or direct composition of custom models.

Maintenance & Community

Actively developed by LinkedIn, with significant community contributions (50+ PRs, 10+ contributors). Supported by NVIDIA, AMD, and Intel for GPU resources. Integrations with Hugging Face, Lightning AI, Axolotl, and Llama-Factory. Discord channel available for discussion.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While generally stable, some kernels are marked as experimental. Compatibility with specific model architectures not explicitly listed in the high-level APIs may require manual integration using low-level APIs.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

105 stars in the last 30 days