tilelang-ascend  by tile-ai

High-performance AI kernel development for Huawei Ascend NPUs

Created 8 months ago
307 stars

Top 87.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TileLang-Ascend is a specialized DSL built on TileLang and TVM, designed to generate high-performance AI compute kernels for Huawei Ascend NPU architectures. It enables developers to achieve state-of-the-art performance on Ascend processors by abstracting low-level hardware complexities while retaining control for optimization, targeting AI researchers and engineers.

How It Works

Leveraging a Pythonic syntax and TVM compiler infrastructure, TileLang-Ascend translates high-level kernel descriptions into optimized code for Ascend NPUs, supporting Ascend C & PTO and AscendNPU IR backends. The DSL facilitates efficient implementation of core AI operations like GEMM and attention, allowing developers to focus on algorithmic innovation rather than intricate hardware-specific programming.

Quick Start & Requirements

  • Installation: Recommended: pip install tilelang-*.whl. Alternatives include building from source via ./build_wheel_ascend.sh or install_ascend.sh.
  • Prerequisites: Ascend environment with CANN (>= 8.3.RC1) and torch-npu (>= 2.6.0.RC1) installed. Requires setting CANN environment variables.
  • Hardware: Tested on Huawei Ascend A2 and A3 NPUs.
  • Documentation: Programming Guide, Video Course Series.

Highlighted Details

  • Supports a wide range of operators: GEMM, Batch GEMM, Elementwise Operations, Flash Attention, Sparse Flash Attention, Softmax, Normalization, Activation Functions, Reduce, Sort, Convolution, and Cross Entropy Loss.
  • Features advanced optimization techniques: automatic synchronization, buffer reuse, software pipelining (T.Pipelined), automatic vectorization (T.Parallel), explicit scope management, and automatic workspace allocation.
  • Includes examples for PyTorch integration (torch_tl_ascend) and graph-level optimization (ACLGraph).
  • Recent updates include DeepSeek V4 kernels and Flash Attention optimization guides.

Maintenance & Community

Open-sourced on September 29, 2025. Acknowledges support from Huawei and Peking University. No specific community channels or roadmap links are provided in the README.

Licensing & Compatibility

The README does not specify a software license, making it impossible to determine compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Primarily tested on Ascend A2 and A3 NPUs. The T.Pipelined feature disallows nested loops. The absence of a specified license is a significant adoption blocker.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
75
Issues (30d)
56
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.