tilelang by tile-ai

DSL for high-performance GPU/CPU kernel development (GEMM, attention, etc.)

Created 1 year ago

4,584 stars

Top 10.7% on SourcePulse

View on GitHub

7 Experts Love This Project

Chaoyu Yang

Founder of Bento

Yaowei Zheng

Author of LLaMA-Factory

Junyang Lin

Core Maintainer at Alibaba Qwen

Luis Capelo

Cofounder of Lightning AI

and 3 more!

Project Summary

Tile Language (tile-lang) is a domain-specific language (DSL) built on TVM for developing high-performance GPU and CPU kernels. It targets AI researchers and engineers seeking to optimize operations like GEMM, FlashAttention, and MLA decoding without sacrificing productivity, offering Pythonic syntax for low-level control.

How It Works

TileLang leverages TVM's compiler infrastructure to translate Python-like DSL code into optimized low-level kernels. It allows explicit control over tiling, data layout, pipelining, and parallelization, enabling developers to fine-tune performance for specific hardware architectures. This approach aims to bridge the gap between high-level productivity and the intricate optimizations required for state-of-the-art AI workloads.

Quick Start & Requirements

Install: pip install tilelang or pip install git+https://github.com/tile-ai/tilelang
Prerequisites: Python 3.x, GCC, python3-setuptools, cmake, libtinfo-dev, zlib1g-dev, build-essential, libedit-dev, libxml2-dev. CUDA 12.1+ for GPU targets.
Setup: Installation via pip is quick. Building from source or using nightly builds requires more setup.
Docs: https://tile-ai.github.io/tilelang/

Highlighted Details

Achieves performance parity with hand-optimized kernels for FlashMLA on AMD MI300X and MLA Decoding on H100.
Supports WebGPU codegen.
Includes debug tools like T.print and a memory layout plotter.
Tested on NVIDIA (H100, A100, V100, RTX 4090/3090/A6000) and AMD (MI250, MI300X) GPUs.

Maintenance & Community

Active development with recent updates including AMD MI300X support and MLA decoding.
Discord community available for discussion and support.
Used in projects like BitBLAS and AttentionEngine.

Licensing & Compatibility

License: Apache 2.0.
Compatible with commercial and closed-source applications.

Limitations & Caveats

Nightly builds may be less stable.
The README mentions "dispatch to the cute/hip on Nvidia/AMD GPUs" for T.gemm, implying reliance on external libraries for the actual GEMM execution, which might introduce additional dependencies or compatibility considerations.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

179

Issues (30d)

Star History

450 stars in the last 30 days