gpu-optimization-workshop  by mlops-discord

Workshop materials for GPU optimization

created 1 year ago
328 stars

Top 84.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides materials for a workshop focused on GPU optimization for machine learning. It targets engineers and researchers seeking to improve the performance of their deep learning models, particularly Large Language Models (LLMs), by leveraging advanced GPU programming techniques and tools. The benefit is a deeper understanding of GPU architecture, optimization strategies, and practical implementation using CUDA and Triton.

How It Works

The workshop covers fundamental GPU concepts, including memory vs. compute bottlenecks and programming models like CUDA (thread-based) and Triton (block-based). It delves into high-performance LLM serving with techniques like token concatenation and optimized batching using NVIDIA's TensorRT-LLM. A significant portion is dedicated to Triton, an intermediate language and compiler designed for expressive, block-based GPU programming, contrasting it with CUDA's SIMT model. Finally, it explores scaling data processing on GPUs using libraries like cuDF and RAPIDS.

Quick Start & Requirements

  • Materials Access: No installation required; materials are slides and notes.
  • Prerequisites: Familiarity with basic GPU concepts is recommended. Key terms include "memory bound vs. compute bound" and "thread-based vs. block-based" programming.
  • Resources: Links to relevant lectures, articles, and community resources are provided for pre-workshop preparation.
  • Official Links:

Highlighted Details

  • Covers both traditional CUDA and the newer block-based Triton programming model.
  • Features talks from experts at Meta, NVIDIA, OpenAI, and Voltron Data.
  • Includes optimization strategies for LLM serving and large-scale data processing on GPUs.
  • Provides links to foundational reading materials and community resources for deeper learning.

Maintenance & Community

  • Hosted by @chiphuyen's Discord community.
  • Speakers are active contributors in the ML/GPU optimization space (PyTorch core, Triton team, TensorRT-LLM).
  • Mentions CUDA MODE Discord as a recommended resource for GPU optimization lectures.

Licensing & Compatibility

  • The repository itself contains slides and notes, likely under a permissive license allowing sharing.
  • Specific software tools discussed (Triton, TensorRT-LLM, cuDF) have their own licenses, which may have implications for commercial use.

Limitations & Caveats

The workshop assumes a baseline technical understanding of GPU concepts, with pre-reading recommended for full comprehension. While the materials are available, the interactive discussion and Q&A were tied to the live event.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
4 more.

lectures by gpu-mode

0.4%
5k
Lecture series for GPU-accelerated computing
created 1 year ago
updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 19 hours ago
Feedback? Help us improve.