luminal  by luminal-ai

Deep learning library using composable compilers for high performance

Created 2 years ago
2,510 stars

Top 18.6% on SourcePulse

GitHubView on GitHub
Project Summary

Luminal is a Rust-based deep learning library designed for high-performance inference and training through a composable, ahead-of-time compilation approach. It targets developers seeking maximum efficiency on diverse hardware, from consumer CPUs and Apple Silicon to NVIDIA GPUs, by compiling computation graphs into optimized, native code.

How It Works

Luminal employs a compile-time-first philosophy, building static computation graphs from 11 primitive operations. This allows its compilers (e.g., CPUCompiler, MetalCompiler, CUDACompiler) to perform aggressive optimizations like kernel fusion and shape-specific code generation, treating the entire network as a single unit. This contrasts with eager execution, aiming to push complexity into the compiler for superior performance and hardware-specific tuning without manual code divergence.

Quick Start & Requirements

  • Install/Run: cargo run --release --features <cuda|metal|cpu> (after cd ./examples/llama and ./setup/setup.sh for Llama 3 example).
  • Prerequisites: Rust toolchain, CUDA Toolkit (for NVIDIA), Metal (for macOS).
  • Resources: Llama 3 8B example runs locally on M-series Macbooks at 15-25 tokens/sec.
  • Docs: https://github.com/jafioti/luminal/blob/main/README.md#getting-started

Highlighted Details

  • Achieves 15-25 tokens/sec for Q8 Llama 3 8B on M-series Macbooks.
  • Supports native compilation for CUDA and Metal, avoiding abstractions.
  • Offers full training support with graph-based autograd.
  • Implements examples for Llama 3, Phi 3, Whisper, and YOLO v8.

Maintenance & Community

  • Active development with a focus on compiler advancements and performance targets.
  • Roadmap includes optimizing CUDA/Metal kernels, distributed training, and matching PyTorch 2.0 performance.

Licensing & Compatibility

  • Licensed under Apache License 2.0 or MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Still under active development with stated goals to match PyTorch API coverage and performance benchmarks.
  • Some optimizations and features, like distributed training, are on the roadmap rather than fully implemented.
Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
16
Issues (30d)
7
Star History
423 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
15 more.

ThunderKittens by HazyResearch

0.6%
3k
CUDA kernel framework for fast deep learning primitives
Created 1 year ago
Updated 2 days ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
7 more.

TransformerEngine by NVIDIA

0.4%
3k
Library for Transformer model acceleration on NVIDIA GPUs
Created 3 years ago
Updated 19 hours ago
Starred by François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
13 more.

neon by NervanaSystems

0%
4k
Deep learning framework (discontinued)
Created 11 years ago
Updated 4 years ago
Feedback? Help us improve.