Discover and explore top open-source AI tools and projects—updated daily.
deepseek-aiOptimized GPU kernels for LLM operations
New!
Top 30.7% on SourcePulse
Summary
TileKernels provides a library of highly optimized GPU kernels for Large Language Model (LLM) operations, developed using TileLang. This project targets engineers and researchers seeking to maximize LLM training and inference performance by leveraging kernels that approach hardware limits for compute intensity and memory bandwidth. Its core benefit lies in delivering state-of-the-art performance through an agile development framework.
How It Works
The project utilizes TileLang, a domain-specific language embedded in Python, for expressing and automatically optimizing high-performance GPU kernels. This approach facilitates easy migration of existing kernels and enables agile development cycles. Key architectural choices focus on pushing compute intensity and memory bandwidth utilization towards hardware ceilings, incorporating specialized kernels for Mixture of Experts (MoE) routing, advanced quantization (FP8/FP4/E5M6), batched transposes, Engram gating, and Manifold HyperConnection (mHC) operations.
Quick Start & Requirements
pip install -e ".[dev]"pip install tile-kernelspytest for correctness and benchmarking.Highlighted Details
torch.autograd.Function wrappers to compose low-level kernels into trainable PyTorch layers.Maintenance & Community
The project lists authors in its citation but provides no specific details regarding active maintainers, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
Released under the permissive MIT License, allowing for broad compatibility with commercial use and integration into closed-source projects.
Limitations & Caveats
The project explicitly states that current kernels "do not represent best practices" and are undergoing active improvement in code quality and documentation. Adoption requires specific, high-end NVIDIA hardware (SM90/SM100 GPUs) and a recent CUDA toolkit version.
5 days ago
Inactive
Dao-AILab
alibaba
ztxz16
Tiiny-AI