intel-xpu-backend-for-triton  by intel

Compiler and language for efficient custom deep learning primitives

Created 3 years ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an OpenAI Triton backend for Intel® GPUs, enabling the creation of highly efficient custom deep-learning primitives. It targets engineers and researchers seeking greater productivity and flexibility than CUDA for specialized computations on Intel hardware.

How It Works

Triton functions as a domain-specific language (DSL) and compiler, abstracting hardware details. It uses an intermediate language (IL) and an LLVM-based compiler to generate optimized code for GPUs and CPUs, focusing on "Tiled Neural Network Computations." This approach aims for performance competitive with CUDA, offering a more productive and flexible development experience.

Quick Start & Requirements

Install the latest stable release via pip install triton; binary wheels support CPython 3.10-3.14. Source installation requires cloning, installing build dependencies, and an editable install (pip install -e .). Building with custom LLVM is supported but needs careful version management. GPU hardware is required for tests. Development containers are available for a consistent environment.

Highlighted Details

  • Key features include a backend rewrite using MLIR, support for complex kernels like flash attention, and extensive debugging via environment variables (e.g., MLIR_ENABLE_DUMP, TRITON_REPRODUCER_PATH).
  • Generates compile_commands.json for IDE code completion.
  • Offers kernel override mechanisms for deep introspection and debugging.

Maintenance & Community

Community contributions are encouraged for bug fixes and features. Detailed contributor guidelines are available. Development environments are standardized via Dev Containers, easing onboarding.

Licensing & Compatibility

The license type is not detailed in the provided README. Triton supports Linux, NVIDIA GPUs (Compute Capability 8.0+), and AMD GPUs (ROCm 6.2+), with ongoing CPU development. The Intel XPU backend specifically extends compatibility to Intel GPUs.

Limitations & Caveats

Triton's reliance on LLVM's unstable API necessitates specific LLVM versions for builds. Full test execution requires a GPU. Specific limitations for the Intel XPU backend are not detailed in this core Triton README.

Health Check
Last Commit

10 hours ago

Responsiveness

Inactive

Pull Requests (30d)
154
Issues (30d)
96
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
18 more.

ThunderKittens by HazyResearch

0.4%
3k
CUDA kernel framework for fast deep learning primitives
Created 2 years ago
Updated 14 hours ago
Feedback? Help us improve.