luminal by luminal-ai

Deep learning library using composable compilers for high performance

Created 2 years ago

2,676 stars

Top 17.5% on SourcePulse

View on GitHub

4 Experts Love This Project

Chaoyu Yang

Founder of Bento

Wing Lian

Founder of Axolotl AI

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Luminal is a Rust-based deep learning library designed for high-performance inference and training through a composable, ahead-of-time compilation approach. It targets developers seeking maximum efficiency on diverse hardware, from consumer CPUs and Apple Silicon to NVIDIA GPUs, by compiling computation graphs into optimized, native code.

How It Works

Luminal employs a compile-time-first philosophy, building static computation graphs from 11 primitive operations. This allows its compilers (e.g., CPUCompiler, MetalCompiler, CUDACompiler) to perform aggressive optimizations like kernel fusion and shape-specific code generation, treating the entire network as a single unit. This contrasts with eager execution, aiming to push complexity into the compiler for superior performance and hardware-specific tuning without manual code divergence.

Quick Start & Requirements

Install/Run: cargo run --release --features <cuda|metal|cpu> (after cd ./examples/llama and ./setup/setup.sh for Llama 3 example).
Prerequisites: Rust toolchain, CUDA Toolkit (for NVIDIA), Metal (for macOS).
Resources: Llama 3 8B example runs locally on M-series Macbooks at 15-25 tokens/sec.
Docs: https://github.com/jafioti/luminal/blob/main/README.md#getting-started

Highlighted Details

Achieves 15-25 tokens/sec for Q8 Llama 3 8B on M-series Macbooks.
Supports native compilation for CUDA and Metal, avoiding abstractions.
Offers full training support with graph-based autograd.
Implements examples for Llama 3, Phi 3, Whisper, and YOLO v8.

Maintenance & Community

Active development with a focus on compiler advancements and performance targets.
Roadmap includes optimizing CUDA/Metal kernels, distributed training, and matching PyTorch 2.0 performance.

Licensing & Compatibility

Licensed under Apache License 2.0 or MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

Still under active development with stated goals to match PyTorch API coverage and performance benchmarks.
Some optimizations and features, like distributed training, are on the roadmap rather than fully implemented.

Health Check

Last Commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

42 stars in the last 30 days