Liger-Kernel  by linkedin

Triton kernels for efficient LLM training

created 1 year ago
5,442 stars

Top 9.4% on sourcepulse

GitHubView on GitHub
Project Summary

Liger Kernel provides a suite of optimized Triton kernels designed to significantly enhance the efficiency of Large Language Model (LLM) training. Targeting researchers and engineers working with LLMs, it offers substantial improvements in training throughput and memory usage, enabling larger models and longer context lengths.

How It Works

Liger Kernel leverages Triton's capabilities for low-level GPU programming to fuse common LLM operations like RMSNorm, RoPE, SwiGLU, and various loss functions. This fusion, combined with techniques like in-place computation and chunking, reduces memory bandwidth requirements and computational overhead. The kernels are designed for exact computation, ensuring no loss of accuracy compared to standard implementations.

Quick Start & Requirements

  • Installation: pip install liger-kernel (stable) or pip install liger-kernel-nightly (nightly). Install from source via git clone and pip install -e ..
  • Prerequisites: CUDA (>= 2.1.2) with Triton (>= 2.3.0) for NVIDIA, or ROCm (>= 2.5.0) with Triton (>= 3.0.0) for AMD. transformers (>= 4.x) is required for patching APIs.
  • Setup: Minimal dependencies, primarily Torch and Triton.
  • Resources: Supports multi-GPU setups (FSDP, DeepSpeed, DDP).
  • Documentation: Getting Started, Examples, High-level APIs, Low-level APIs.

Highlighted Details

  • Up to 20% throughput increase and 60% memory reduction for LLM training layers.
  • Up to 80% memory savings for post-training alignment and distillation tasks (DPO, ORPO, CPO, etc.).
  • Full AMD ROCm support alongside NVIDIA CUDA.
  • One-line patching for Hugging Face models or direct composition of custom models.

Maintenance & Community

Actively developed by LinkedIn, with significant community contributions (50+ PRs, 10+ contributors). Supported by NVIDIA, AMD, and Intel for GPU resources. Integrations with Hugging Face, Lightning AI, Axolotl, and Llama-Factory. Discord channel available for discussion.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While generally stable, some kernels are marked as experimental. Compatibility with specific model architectures not explicitly listed in the high-level APIs may require manual integration using low-level APIs.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
45
Issues (30d)
22
Star History
517 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 18 hours ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 18 hours ago
Feedback? Help us improve.