luminal  by jafioti

Deep learning library using composable compilers for high performance

created 2 years ago
2,057 stars

Top 22.1% on sourcepulse

GitHubView on GitHub
Project Summary

Luminal is a Rust-based deep learning library designed for high-performance inference and training through a composable, ahead-of-time compilation approach. It targets developers seeking maximum efficiency on diverse hardware, from consumer CPUs and Apple Silicon to NVIDIA GPUs, by compiling computation graphs into optimized, native code.

How It Works

Luminal employs a compile-time-first philosophy, building static computation graphs from 11 primitive operations. This allows its compilers (e.g., CPUCompiler, MetalCompiler, CUDACompiler) to perform aggressive optimizations like kernel fusion and shape-specific code generation, treating the entire network as a single unit. This contrasts with eager execution, aiming to push complexity into the compiler for superior performance and hardware-specific tuning without manual code divergence.

Quick Start & Requirements

  • Install/Run: cargo run --release --features <cuda|metal|cpu> (after cd ./examples/llama and ./setup/setup.sh for Llama 3 example).
  • Prerequisites: Rust toolchain, CUDA Toolkit (for NVIDIA), Metal (for macOS).
  • Resources: Llama 3 8B example runs locally on M-series Macbooks at 15-25 tokens/sec.
  • Docs: https://github.com/jafioti/luminal/blob/main/README.md#getting-started

Highlighted Details

  • Achieves 15-25 tokens/sec for Q8 Llama 3 8B on M-series Macbooks.
  • Supports native compilation for CUDA and Metal, avoiding abstractions.
  • Offers full training support with graph-based autograd.
  • Implements examples for Llama 3, Phi 3, Whisper, and YOLO v8.

Maintenance & Community

  • Active development with a focus on compiler advancements and performance targets.
  • Roadmap includes optimizing CUDA/Metal kernels, distributed training, and matching PyTorch 2.0 performance.

Licensing & Compatibility

  • Licensed under Apache License 2.0 or MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Still under active development with stated goals to match PyTorch API coverage and performance benchmarks.
  • Some optimizations and features, like distributed training, are on the roadmap rather than fully implemented.
Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
13
Issues (30d)
7
Star History
546 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
7 more.

ThunderKittens by HazyResearch

0.6%
3k
CUDA kernel framework for fast deep learning primitives
created 1 year ago
updated 3 days ago
Starred by Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), Mckay Wrigley Mckay Wrigley(Founder of Takeoff AI), and
8 more.

ggml by ggml-org

0.3%
13k
Tensor library for machine learning
created 2 years ago
updated 3 days ago
Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Didier Lopes Didier Lopes(Founder of OpenBB), and
15 more.

llm.c by karpathy

0.2%
27k
LLM training in pure C/CUDA, no PyTorch needed
created 1 year ago
updated 1 month ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 20 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.