KernelAgent by meta-pytorch

Autonomous GPU kernel generation and optimization via AI agents

Created 11 months ago

445 stars

Top 66.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

KernelAgent addresses the challenge of autonomously generating and optimizing GPU kernels for PyTorch programs, transforming them into verified Triton kernels. It targets engineers, researchers, and power users seeking to enhance GPU performance by automating complex kernel synthesis and optimization tasks. The primary benefit is the potential for significant performance improvements and reduced manual development effort through an LLM-driven, multi-stage pipeline.

How It Works

KernelAgent employs a multi-stage approach: static analysis to determine the optimal path (lightweight or full LLM pipeline), LLM-assisted refactoring to isolate fusable subgraphs, parallel generation of Triton kernels with strict runtime verification, end-to-end composition of synthesized kernels to rebuild the original forward pass, and a hardware-guided optimization pipeline for iterative performance enhancement. This LLM-driven synthesis and verification process aims for correctness and efficiency.

Quick Start & Requirements

Installation: pip install -e .
Prerequisites:
- Python 3.8 – 3.12
- Linux or macOS
- GPU: NVIDIA GPU (CUDA) or Intel GPU (XPU with oneAPI)
- Triton: pip install triton or nightly build
- PyTorch: Installation instructions available at https://pytorch.org/get-started/locally/
- LLM Provider: OpenAI, Anthropic, or a self-hosted relay (API keys required).
- KernelBench (optional, for examples): git clone https://github.com/ScalingIntelligence/KernelBench.git
Documentation: Blog post available at https://pytorch.org/blog/kernelfalcon-autonomous-gpu-kernel-generation-via-deep-agents/

Highlighted Details

Autonomous generation and optimization of Triton kernels from PyTorch code using LLMs.
Strict runtime verification ensures synthesized kernels are numerically correct.
Hardware-guided optimization pipeline leverages GPU profiling (NCU), roofline analysis, and LLM-driven bottleneck diagnosis for iterative improvements.
Supports both NVIDIA CUDA and Intel XPU platforms.
Generates detailed artifacts for reproducibility and inspection.
Offers multiple Gradio-based UIs for interactive use.

Maintenance & Community

The project is hosted by pytorch-labs. The primary community interaction point mentioned is the GitHub Issues page: https://github.com/pytorch-labs/KernelAgent/issues. No specific details on active contributors, sponsorships, or community channels like Discord/Slack are provided in the README.

Licensing & Compatibility

KernelAgent is released under the Apache License 2.0. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

The system relies heavily on external LLM providers, requiring API keys and potentially incurring costs. Setup involves managing multiple dependencies including PyTorch, Triton, and LLM configurations. Intel XPU support necessitates compatible hardware and specific drivers. The strict verification process may halt if generated kernels fail correctness checks, potentially requiring manual intervention.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

37 stars in the last 30 days