Operator library for LLM training/inference, implemented in Triton
Top 53.3% on sourcepulse
FlagGems is a high-performance operator library for large language models, implemented in OpenAI Triton and designed to accelerate LLM training and inference. It targets researchers and engineers working with PyTorch, offering a seamless integration with the ATen backend to boost performance without requiring model code modifications.
How It Works
FlagGems provides a suite of kernel functions written in Triton, a language comparable to CUDA in readability and performance. It integrates with PyTorch's ATen backend, allowing users to switch to Triton kernels with minimal code changes. The library features an automatic code generation system for pointwise and fused operators, supporting various computational needs. A LibEntry
mechanism independently manages kernel caching, bypassing traditional autotuning runtimes for simplified cache keys and reduced overhead.
Quick Start & Requirements
pip install flaggems
(pure Python) or with C++ extensions for improved performance.float16
, float32
, and bfloat16
precision.Highlighted Details
silu_and_mul
and apply_rotary_position_embedding
.Maintenance & Community
flaggems@baai.ac.cn
or by submitting an issue.Licensing & Compatibility
Limitations & Caveats
The library primarily targets NVIDIA GPUs, with explicit support for float16
, float32
, and bfloat16
data types. While it aims for broad operator coverage, specific operator availability should be confirmed against the OperatorList
.
1 day ago
1 day