Discover and explore top open-source AI tools and projects—updated daily.
RightNow-AIAutonomous GPU kernel optimization for PyTorch
New!
Top 61.1% on SourcePulse
Summary RightNow-AI/autokernel provides an autonomous agent for optimizing GPU kernels in PyTorch models. It targets engineers and researchers seeking maximum hardware performance by automatically identifying, optimizing, and verifying bottleneck kernels using Triton or CUDA C++. The core benefit is obtaining significantly faster, production-ready GPU kernels without manual optimization.
How It Works
The system employs an agent-driven, iterative refinement loop. It profiles PyTorch models to pinpoint bottlenecks, extracts them into standalone Triton or CUDA C++ kernels, and autonomously optimizes each via an edit-benchmark-keep/revert cycle on kernel.py. Orchestration uses Amdahl's Law to prioritize optimizations yielding the greatest end-to-end speedup. All performance gains are validated against a rigorous 5-stage correctness harness (bench.py) before acceptance, ensuring functional integrity.
Quick Start & Requirements
Requires uv (installable via curl), Python 3.10+, and an NVIDIA GPU (tested on H100/A100/RTX 4090). Key setup: uv run prepare.py. Initiate profiling with uv run profile.py, extract top kernels via uv run extract.py, and benchmark/verify with uv run bench.py. Agent interaction involves providing program.md instructions to an external coding agent (e.g., Claude, Codex).
Highlighted Details
kernel.py, simplifying review and rollback.results.tsv for easy parsing.Maintenance & Community
Inspired by Andrej Karpathy's autoresearch methodology. KernelBench integration builds upon work from Stanford's Scaling Intelligence Lab. No specific community channels or prominent maintainer/sponsor details are provided.
Licensing & Compatibility Released under the MIT license, which is highly permissive and suitable for commercial use and integration into closed-source projects.
Limitations & Caveats
Operation is strictly limited to NVIDIA GPUs. Autonomous agent functionality requires integration with external coding agents to interpret program.md. Specific performance targets across all supported kernel types are not detailed in the README.
2 days ago
Inactive
meta-pytorch
ByteDance-Seed
ScalingIntelligence
mirage-project
baidu-research
gpu-mode