gpu-optimization-workshop by mlops-discord

Workshop materials for GPU optimization

Created 1 year ago

337 stars

Top 81.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This repository provides materials for a workshop focused on GPU optimization for machine learning. It targets engineers and researchers seeking to improve the performance of their deep learning models, particularly Large Language Models (LLMs), by leveraging advanced GPU programming techniques and tools. The benefit is a deeper understanding of GPU architecture, optimization strategies, and practical implementation using CUDA and Triton.

How It Works

The workshop covers fundamental GPU concepts, including memory vs. compute bottlenecks and programming models like CUDA (thread-based) and Triton (block-based). It delves into high-performance LLM serving with techniques like token concatenation and optimized batching using NVIDIA's TensorRT-LLM. A significant portion is dedicated to Triton, an intermediate language and compiler designed for expressive, block-based GPU programming, contrasting it with CUDA's SIMT model. Finally, it explores scaling data processing on GPUs using libraries like cuDF and RAPIDS.

Quick Start & Requirements

Materials Access: No installation required; materials are slides and notes.
Prerequisites: Familiarity with basic GPU concepts is recommended. Key terms include "memory bound vs. compute bound" and "thread-based vs. block-based" programming.
Resources: Links to relevant lectures, articles, and community resources are provided for pre-workshop preparation.
Official Links:
- RSVP: https://lu.ma/1wu5ppl5
- YouTube Recording: (Implied, not directly linked)
- Triton Development Repository: https://github.com/openai/triton
- TensorRT-LLM: (Implied, not directly linked)
- cuDF: https://github.com/rapidsai/cudf

Highlighted Details

Covers both traditional CUDA and the newer block-based Triton programming model.
Features talks from experts at Meta, NVIDIA, OpenAI, and Voltron Data.
Includes optimization strategies for LLM serving and large-scale data processing on GPUs.
Provides links to foundational reading materials and community resources for deeper learning.

Maintenance & Community

Hosted by @chiphuyen's Discord community.
Speakers are active contributors in the ML/GPU optimization space (PyTorch core, Triton team, TensorRT-LLM).
Mentions CUDA MODE Discord as a recommended resource for GPU optimization lectures.

Licensing & Compatibility

The repository itself contains slides and notes, likely under a permissive license allowing sharing.
Specific software tools discussed (Triton, TensorRT-LLM, cuDF) have their own licenses, which may have implications for commercial use.

Limitations & Caveats

The workshop assumes a baseline technical understanding of GPU concepts, with pre-reading recommended for full comprehension. While the materials are available, the interactive discussion and Q&A were tied to the live event.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days