mirage  by mirage-project

Tool for fast GPU kernel generation via superoptimization

Created 1 year ago
1,930 stars

Top 22.6% on SourcePulse

GitHubView on GitHub
Project Summary

Mirage is a tool that automatically generates highly-optimized GPU kernels for PyTorch programs using superoptimization, targeting researchers and engineers seeking to accelerate deep learning workloads without manual kernel programming. It enables users to describe computations in Python, which Mirage then transforms into fast, custom GPU kernels that can outperform expert-written ones.

How It Works

Mirage employs a multi-level superoptimization approach. It first translates a PyTorch program into an intermediate representation (IR) and then searches a vast space of functionally equivalent GPU kernels. By exploring various optimization strategies and low-level code generation techniques, it discovers kernels that achieve significant speedups, such as fusing operations like RMSNorm and Linear layers for Transformer models.

Quick Start & Requirements

  • Installation: pip install mirage-project or install from pre-built wheels (e.g., pip install https://github.com/mirage-project/mirage/releases/download/v0.2.2/mirage_project-0.2.2+cu122-cp310-cp310-linux_x86_64.whl). Source installation requires git clone --recursive https://www.github.com/mirage-project/mirage and pip install -e . -v.
  • Prerequisites: CUDA (version specified in wheel, e.g., 12.2), Python 3.10+.
  • Resources: Requires GPU for kernel generation and execution.
  • Documentation: Tutorials are available for examples.

Highlighted Details

  • Achieves 1.5–1.7x speedup for fused RMSNorm and Linear operations compared to separate PyTorch operators.
  • Automatically searches for and discovers optimized kernel candidates.
  • Integrates generated kernels into PyTorch programs with minimal code changes.
  • Supports arbitrary PyTorch programs for kernel generation.

Maintenance & Community

  • The project welcomes contributions and bug reports via GitHub issues.
  • A paper detailing Mirage's techniques is available on arXiv.

Licensing & Compatibility

  • Mirage uses the Apache License 2.0.
  • This license is permissive and generally compatible with commercial and closed-source use.

Limitations & Caveats

The project is associated with a 2025 OSDI publication, suggesting it may still be under active development or research. Specific CUDA and Python version compatibility might be tied to pre-built wheels or source compilation requirements.

Health Check
Last Commit

20 hours ago

Responsiveness

1 day

Pull Requests (30d)
31
Issues (30d)
20
Star History
72 stars in the last 30 days

Explore Similar Projects

Starred by Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

KernelBench by ScalingIntelligence

1.2%
643
Benchmark for LLMs generating GPU kernels from PyTorch ops
Created 1 year ago
Updated 11 hours ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.5%
5k
Lecture series for GPU-accelerated computing
Created 1 year ago
Updated 1 month ago
Starred by Nathan Lambert Nathan Lambert(Research Scientist at AI2), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
7 more.

DeepGEMM by deepseek-ai

0.3%
6k
CUDA library for efficient FP8 GEMM kernels with fine-grained scaling
Created 8 months ago
Updated 2 weeks ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.