CUDA-Agent by BytedTsinghua-SIA

Agentic RL for high-performance CUDA kernel generation

Created 5 months ago

1,117 stars

Top 33.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Wing Lian

Founder of Axolotl AI

Project Summary

Summary

CUDA-Agent addresses the challenge of generating high-performance CUDA kernels by employing a novel Large-Scale Agentic Reinforcement Learning approach. Designed for researchers and engineers optimizing GPU computations, it offers state-of-the-art performance, significantly outperforming existing LLMs and compilation baselines on complex kernel generation tasks.

How It Works

This project utilizes an RL-trained model to generate CUDA kernels, achieving superior results on the KernelBench benchmark. Its core innovation lies in an agentic workspace (agent_workdir) that orchestrates a full development loop: generating kernels, compiling them, verifying correctness, profiling performance, and iterating based on feedback. This structured, iterative approach allows for targeted optimization beyond standard compilation methods.

Quick Start & Requirements

The project provides an agent_workdir with scripts for compilation (utils/compile.sh), correctness verification (utils/verification.py), and performance profiling (utils/profiling.py). A 6,000-sample training dataset, CUDA-Agent-Ops-6K, is also released. Setup likely requires a CUDA-enabled environment and Python 3. Specific hardware requirements (e.g., GPU model, VRAM) and detailed installation steps are not explicitly provided in the README. Links to the dataset are available.

Highlighted Details

Achieves state-of-the-art performance on KernelBench, surpassing advanced LLMs like Claude Opus-4.6 and Gemini 3 Pro.
Consistently outperforms the torch.compile baseline, especially on challenging kernel generation tasks.
Released the CUDA-Agent-Ops-6K training dataset, including its construction pipeline and filtering criteria.
Provides an agent environment with workflow constraints (SKILL.md) and a full development loop implementation.

Maintenance & Community

The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap.

Licensing & Compatibility

The README does not specify the project's license. This lack of information presents a significant barrier to assessing compatibility for commercial use or integration into closed-source projects.

Limitations & Caveats

Features such as agent trace results and a web demo are noted as forthcoming ("Please stay tuned"). The README focuses primarily on the generation capabilities and benchmark results, with limited detail on the underlying RL training infrastructure or comprehensive setup requirements. The absence of a stated license is a critical caveat.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

35 stars in the last 30 days