Discover and explore top open-source AI tools and projects—updated daily.
BytedTsinghua-SIAAgentic RL for high-performance CUDA kernel generation
Top 38.8% on SourcePulse
Summary
CUDA-Agent addresses the challenge of generating high-performance CUDA kernels by employing a novel Large-Scale Agentic Reinforcement Learning approach. Designed for researchers and engineers optimizing GPU computations, it offers state-of-the-art performance, significantly outperforming existing LLMs and compilation baselines on complex kernel generation tasks.
How It Works
This project utilizes an RL-trained model to generate CUDA kernels, achieving superior results on the KernelBench benchmark. Its core innovation lies in an agentic workspace (agent_workdir) that orchestrates a full development loop: generating kernels, compiling them, verifying correctness, profiling performance, and iterating based on feedback. This structured, iterative approach allows for targeted optimization beyond standard compilation methods.
Quick Start & Requirements
The project provides an agent_workdir with scripts for compilation (utils/compile.sh), correctness verification (utils/verification.py), and performance profiling (utils/profiling.py). A 6,000-sample training dataset, CUDA-Agent-Ops-6K, is also released. Setup likely requires a CUDA-enabled environment and Python 3. Specific hardware requirements (e.g., GPU model, VRAM) and detailed installation steps are not explicitly provided in the README. Links to the dataset are available.
Highlighted Details
torch.compile baseline, especially on challenging kernel generation tasks.CUDA-Agent-Ops-6K training dataset, including its construction pipeline and filtering criteria.SKILL.md) and a full development loop implementation.Maintenance & Community
The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
The README does not specify the project's license. This lack of information presents a significant barrier to assessing compatibility for commercial use or integration into closed-source projects.
Limitations & Caveats
Features such as agent trace results and a web demo are noted as forthcoming ("Please stay tuned"). The README focuses primarily on the generation capabilities and benchmark results, with limited detail on the underlying RL training infrastructure or comprehensive setup requirements. The absence of a stated license is a critical caveat.
1 month ago
Inactive
meta-pytorch
RightNow-AI
ScalingIntelligence
mirage-project
NVIDIA
deepseek-ai