Discover and explore top open-source AI tools and projects—updated daily.
TongmingLAICAgentic kernel optimization for any hardware
Top 92.3% on SourcePulse
Summary
AKO4ALL automates GPU kernel optimization across diverse hardware, languages, and kernel types. It targets engineers and researchers, accelerating development by iteratively profiling, editing, and benchmarking to achieve expert-level performance, often surpassing established optimized libraries.
How It Works
The system employs an iterative agentic loop, initiated by dropping a kernel into a working directory and invoking it via a coding agent. AKO4ALL bootstraps a workspace, analyzes the kernel and inputs, and refines code through profiling, benchmarking, and logging. It can dynamically switch languages (e.g., Triton to CUDA) and use web searches for strategies when progress stalls, continuing until performance gains plateau.
Quick Start & Requirements
Install by cloning the repo into a coding agent's skills directory (e.g., ~/.claude/skills/ako4all) or creating a symlink. Requirements include a coding agent (e.g., Claude Code), NVIDIA GPU with CUDA, PyTorch (for built-in evaluator), Python >= 3.10, and NVIDIA Nsight Compute (version-matched). Optimization typically completes in under an hour.
Highlighted Details
1 week ago
Inactive
cfregly
openvinotoolkit