gpu-perf-engineering-resources by wafer-ai

GPU performance engineering curriculum for AI infrastructure

Created 6 months ago

1,220 stars

Top 31.5% on SourcePulse

Project Summary

Summary

This repository offers a comprehensive, tiered curriculum for engineers focused on GPU performance engineering for high-performance AI systems. It guides learners from fundamental GPU programming to cutting-edge techniques used in frontier AI labs, enabling effective optimization of AI infrastructure.

How It Works

The curriculum is structured into sequential tiers, covering GPU architecture, low-level programming (PTX, SASS), optimization for core operations (matmul, attention), and modern AI inference systems. It emphasizes foundational knowledge, practical insights from practitioner blogs, and official documentation, balancing fundamental concepts with advanced techniques.

Quick Start & Requirements

This is a learning curriculum, not a software project. It outlines a recommended reading order. Applying the learned concepts requires access to GPUs (NVIDIA, AMD), CUDA/ROCm toolkits, and potentially specific hardware architectures for advanced topics.

Highlighted Details

In-depth coverage of AI acceleration: FlashAttention (v1-v3), PagedAttention, KV cache optimization.
Exploration of compiler DSLs: OpenAI's Triton, NVIDIA's CUTLASS, Mojo.
Profiling and optimization using NVIDIA tools (Nsight Compute) and the Roofline model.
Resources for alternative hardware: AMD GPUs (ROCm) and Google TPUs.
Production inference systems: continuous batching, speculative decoding, LLM-generated kernels.

Maintenance & Community

Contributions prioritize primary sources and practitioner insights. The project fosters a large community via its active Discord server (23k+ members) and curated learning materials.

Licensing & Compatibility

The MIT license is permissive, allowing broad adoption and integration of learned principles in commercial and closed-source contexts.

Limitations & Caveats

As a curriculum, it lacks direct code execution or hands-on labs. It provides knowledge pointers, requiring users to set up their own environments. While covering AMD/TPUs, the primary focus and detail depth are on NVIDIA hardware and CUDA.

gpu-perf-engineering-resources by wafer-ai

Explore Similar Projects

mHC.cu by AndreSlavescu

triton-viz by Deep-Learning-Profiling-Tools

gpu-optimization-workshop by mlops-discord

llm-algo-leetcode by datawhalechina

ai-infra-hpc by jinbooooom

awesome-gpu-engineering by goabiaryan

KernelAgent by meta-pytorch

FlagPerf by flagos-ai

efficient-dl-systems by mryab

ai-performance-engineering by cfregly

aiter by ROCm

fastllm by ztxz16