tilelang  by tile-ai

DSL for high-performance GPU/CPU kernel development (GEMM, attention, etc.)

created 10 months ago
1,484 stars

Top 28.3% on sourcepulse

GitHubView on GitHub
Project Summary

Tile Language (tile-lang) is a domain-specific language (DSL) built on TVM for developing high-performance GPU and CPU kernels. It targets AI researchers and engineers seeking to optimize operations like GEMM, FlashAttention, and MLA decoding without sacrificing productivity, offering Pythonic syntax for low-level control.

How It Works

TileLang leverages TVM's compiler infrastructure to translate Python-like DSL code into optimized low-level kernels. It allows explicit control over tiling, data layout, pipelining, and parallelization, enabling developers to fine-tune performance for specific hardware architectures. This approach aims to bridge the gap between high-level productivity and the intricate optimizations required for state-of-the-art AI workloads.

Quick Start & Requirements

  • Install: pip install tilelang or pip install git+https://github.com/tile-ai/tilelang
  • Prerequisites: Python 3.x, GCC, python3-setuptools, cmake, libtinfo-dev, zlib1g-dev, build-essential, libedit-dev, libxml2-dev. CUDA 12.1+ for GPU targets.
  • Setup: Installation via pip is quick. Building from source or using nightly builds requires more setup.
  • Docs: https://tile-ai.github.io/tilelang/

Highlighted Details

  • Achieves performance parity with hand-optimized kernels for FlashMLA on AMD MI300X and MLA Decoding on H100.
  • Supports WebGPU codegen.
  • Includes debug tools like T.print and a memory layout plotter.
  • Tested on NVIDIA (H100, A100, V100, RTX 4090/3090/A6000) and AMD (MI250, MI300X) GPUs.

Maintenance & Community

  • Active development with recent updates including AMD MI300X support and MLA decoding.
  • Discord community available for discussion and support.
  • Used in projects like BitBLAS and AttentionEngine.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Nightly builds may be less stable.
  • The README mentions "dispatch to the cute/hip on Nvidia/AMD GPUs" for T.gemm, implying reliance on external libraries for the actual GEMM execution, which might introduce additional dependencies or compatibility considerations.
Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
68
Issues (30d)
16
Star History
409 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

gpu.cpp by AnswerDotAI

0.2%
4k
C++ library for portable GPU computation using WebGPU
created 1 year ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
16 more.

flash-attention by Dao-AILab

0.7%
19k
Fast, memory-efficient attention implementation
created 3 years ago
updated 14 hours ago
Feedback? Help us improve.