AKO4ALL by TongmingLAIC

Agentic kernel optimization for any hardware

Created 4 months ago

324 stars

Top 83.7% on SourcePulse

Project Summary

Summary

AKO4ALL automates GPU kernel optimization across diverse hardware, languages, and kernel types. It targets engineers and researchers, accelerating development by iteratively profiling, editing, and benchmarking to achieve expert-level performance, often surpassing established optimized libraries.

How It Works

The system employs an iterative agentic loop, initiated by dropping a kernel into a working directory and invoking it via a coding agent. AKO4ALL bootstraps a workspace, analyzes the kernel and inputs, and refines code through profiling, benchmarking, and logging. It can dynamically switch languages (e.g., Triton to CUDA) and use web searches for strategies when progress stalls, continuing until performance gains plateau.

Quick Start & Requirements

Install by cloning the repo into a coding agent's skills directory (e.g., ~/.claude/skills/ako4all) or creating a symlink. Requirements include a coding agent (e.g., Claude Code), NVIDIA GPU with CUDA, PyTorch (for built-in evaluator), Python >= 3.10, and NVIDIA Nsight Compute (version-matched). Optimization typically completes in under an hour.

AKO4ALL by TongmingLAIC

Explore Similar Projects

aitune by ai-dynamo

Hy3-preview by Tencent-Hunyuan

Crane by lucasjinreal

flashinfer-bench by flashinfer-ai

AutoSOTA by tsinghua-fib-lab

kernel-design-agents by mit-han-lab

Step-3.5-Flash by stepfun-ai

FlagPerf by flagos-ai

xpu-perf by bytedance

ai-performance-engineering by cfregly

aiter by ROCm

openvino by openvinotoolkit