Workshop materials for GPU optimization
Top 84.4% on sourcepulse
This repository provides materials for a workshop focused on GPU optimization for machine learning. It targets engineers and researchers seeking to improve the performance of their deep learning models, particularly Large Language Models (LLMs), by leveraging advanced GPU programming techniques and tools. The benefit is a deeper understanding of GPU architecture, optimization strategies, and practical implementation using CUDA and Triton.
How It Works
The workshop covers fundamental GPU concepts, including memory vs. compute bottlenecks and programming models like CUDA (thread-based) and Triton (block-based). It delves into high-performance LLM serving with techniques like token concatenation and optimized batching using NVIDIA's TensorRT-LLM. A significant portion is dedicated to Triton, an intermediate language and compiler designed for expressive, block-based GPU programming, contrasting it with CUDA's SIMT model. Finally, it explores scaling data processing on GPUs using libraries like cuDF and RAPIDS.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The workshop assumes a baseline technical understanding of GPU concepts, with pre-reading recommended for full comprehension. While the materials are available, the interactive discussion and Q&A were tied to the live event.
1 year ago
Inactive