awesome-gpu-engineering  by goabiaryan

Master GPU engineering for AI systems

Created 8 months ago
299 stars

Top 88.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository curates essential resources for GPU engineering, targeting engineers and researchers focused on AI systems. It provides a structured learning path from foundational concepts to advanced large-scale distributed systems, aiming to accelerate expertise in GPU acceleration.

How It Works

The project functions as a comprehensive, categorized list of learning materials. It organizes links to foundational books, programming frameworks (CUDA, ROCm, OpenCL), optimization tools (Nsight, Triton), architecture details, multi-GPU systems (NCCL, DeepSpeed), tutorials, research papers, and AI/ML-specific GPU techniques. This curated approach offers a guided overview of the complex GPU engineering landscape.

Quick Start & Requirements

As a curated list, there is no direct installation or execution command. Users are directed to external resources like books, documentation, and tutorials. Prerequisites are implied by the topics covered, potentially including specific hardware (GPUs), software development kits (CUDA, ROCm), and deep learning frameworks. Relevant links to official documentation and courses are provided within the list.

Highlighted Details

  • Covers core GPU programming paradigms with resources on CUDA, ROCm, OpenCL, SYCL, Vulkan Compute, and Metal.
  • Features essential optimization tools and techniques, including NVIDIA Nsight, TensorRT, OpenAI Triton, and the Roofline Model.
  • Details multi-GPU and distributed systems engineering with libraries like NCCL, DeepSpeed, Megatron-LM, and frameworks such as vLLM and Hugging Face Accelerate.
  • Includes academic resources like Stanford CS149 and CMU 15-418/618 courses, alongside research papers on GPU architecture and AI workload scheduling.
  • Highlights AI/ML specific GPU acceleration methods like FlashAttention, PyTorch CUDA Extensions, and JAX+XLA.

Maintenance & Community

Contributions are welcomed via pull requests following contribution guidelines. Community engagement is facilitated through a mention of the "GPU MODE Discord." The list is inspired by other "awesome" repositories in related fields like HPC and computer architecture.

Licensing & Compatibility

The repository is licensed under CC BY 4.0, allowing for sharing and adaptation with proper attribution. This license generally permits broad use, including commercial applications, provided the attribution requirement is met.

Limitations & Caveats

This is a curated list of external resources, not a runnable software project. The rapidly evolving nature of GPU technology means some linked resources or tools may become outdated. Access to certain materials, such as books, may require purchase. Course dates like "Fall 2025" indicate some content may be forward-looking or historical.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 4 years ago
Updated 3 years ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai) and Carol Willing Carol Willing(Core Contributor to CPython, Jupyter).

ai-performance-engineering by cfregly

1.2%
1k
AI Systems Performance Engineering for modern AI workloads
Created 1 year ago
Updated 4 weeks ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.3%
30k
LLM training in pure C/CUDA, no PyTorch needed
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.