lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

1,432 stars

Top 28.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Luca Antiga

CTO of Lightning AI

William Falcon

Founder of Lightning AI

Christian Sarofeen

Director Engineering Deep Learning Frameworks at NVIDIA

Jeff Hammerbacher

Cofounder of Cloudera

and 2 more!

Project Summary

Thunder is a source-to-source compiler for PyTorch, designed to optimize model performance, memory usage, and parallelism. It targets both end-users seeking out-of-the-box speed-ups and performance experts needing an extensible framework for custom optimizations like kernel fusion, quantization, and distributed strategies.

How It Works

Thunder operates in three stages: it first acquires the PyTorch model by interpreting Python bytecode into a straight-line Python program. Next, it transforms this computation trace to incorporate optimizations such as distributed strategies or precision changes. Finally, it routes parts of the trace for execution using various backends, including NVFuser, torch.compile, specialized libraries (cuDNN SDPA, TransformerEngine), and custom Triton/CUDA kernels. This approach allows for composable transformations and easy swapping of optimizations.

Quick Start & Requirements

Install via pip: pip install torch==2.6.0 torchvision==0.21 nvfuser-cu124-torch26 followed by pip install lightning-thunder.
For Blackwell support, CUDA 12.8 is required, along with nightly PyTorch and nvfuser builds.
Optional executors like cuDNN SDPA and TransformerEngine (for Float8) can be installed separately.
See Examples for usage with LitGPT and Hugging Face models.

Highlighted Details

Claims up to 40% speed-up for PyTorch models.
Supports quantization, FP4/FP6/FP8 precision, kernel fusion, and various distributed training strategies (DDP, FSDP, TP).
Ready for NVIDIA Blackwell hardware and supports CUDA Graphs.
Integrates custom Triton kernels and offers plugins for easy optimization swapping.

Maintenance & Community

Developed in collaboration with the community, with significant contributions from NVIDIA.
Active development indicated by CI badges.
Community support available via Discord.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires specific PyTorch and CUDA versions for optimal functionality, with advanced features like Blackwell support needing nightly builds.
Performance gains are hardware and model-dependent, as noted in the "may or may not make a big difference" caveat for specific optimizations.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days