HighPerfLLMs2024 by rwitten

Jax course for high-performance LLM construction

Created 1 year ago

552 stars

Top 57.9% on SourcePulse

View on GitHub

4 Experts Love This Project

Roy Frostig

Coauthor of JAX; Research Scientist at Google DeepMind

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Jeff Hammerbacher

Cofounder of Cloudera

Matthew Johnson

Coauthor of JAX; Research Scientist at Google Brain

Project Summary

This repository provides a comprehensive curriculum for building high-performance Large Language Models (LLMs) from scratch using JAX. It targets engineers and researchers aiming to understand and optimize LLM training and inference, covering topics from single-chip roofline analysis to distributed sharding and fused attention. The goal is to equip participants with the skills to design HPC systems that approach physical performance limits.

How It Works

The project guides users through implementing LLMs in JAX, focusing on performance optimization techniques. It delves into roofline analysis for single-chip performance, distributed computing via sharding, and the intricacies of attention mechanisms like fused and FlashAttention. The curriculum emphasizes understanding the underlying mechanics of LLM training and inference to achieve near-peak hardware utilization.

Quick Start & Requirements

Install/Run: No specific installation commands are provided; the content is delivered via recorded sessions, slides, and exercises.
Prerequisites: Familiarity with JAX is recommended. Access to computational resources (TPUs/GPUs) is implied for practical exercises.
Resources: Links to YouTube recordings, slides, and take-home exercises are available for each session.

Highlighted Details

Covers end-to-end LLM implementation in JAX for both training and inference.
Detailed analysis of single-chip and multi-chip performance, including roofline modeling.
Deep dives into attention mechanisms, including fused attention schedules, softmax, and FlashAttention.
Introduction to Pallas for low-level kernel optimization.

Maintenance & Community

The project is led by Rafi Witten, a tech lead at Cloud TPU/GPU Multipod, known for work on MaxText and pioneering "Accurate Quantized Training."
A Discord server is available for community interaction and support: https://discord.gg/2AWcVatVAw.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

The content is presented as a series of recorded sessions and exercises, not a directly executable codebase.
The project appears to be a past course offering, with the last session dated May 29, 2024.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days