Discover and explore top open-source AI tools and projects—updated daily.
FlagOpenOperator library for LLM training/inference, implemented in Triton
Top 46.5% on SourcePulse
FlagGems is a high-performance operator library for large language models, implemented in OpenAI Triton and designed to accelerate LLM training and inference. It targets researchers and engineers working with PyTorch, offering a seamless integration with the ATen backend to boost performance without requiring model code modifications.
How It Works
FlagGems provides a suite of kernel functions written in Triton, a language comparable to CUDA in readability and performance. It integrates with PyTorch's ATen backend, allowing users to switch to Triton kernels with minimal code changes. The library features an automatic code generation system for pointwise and fused operators, supporting various computational needs. A LibEntry mechanism independently manages kernel caching, bypassing traditional autotuning runtimes for simplified cache keys and reduced overhead.
Quick Start & Requirements
pip install flaggems (pure Python) or with C++ extensions for improved performance.float16, float32, and bfloat16 precision.Highlighted Details
silu_and_mul and apply_rotary_position_embedding.Maintenance & Community
flaggems@baai.ac.cn or by submitting an issue.Licensing & Compatibility
Limitations & Caveats
The library primarily targets NVIDIA GPUs, with explicit support for float16, float32, and bfloat16 data types. While it aims for broad operator coverage, specific operator availability should be confirmed against the OperatorList.
10 hours ago
1 day
ScalingIntelligence
ByteDance-Seed
luminal-ai
GeeeekExplorer