cake  by evilsocket

Rust framework for distributed LLM/SD inference across diverse hardware

Created 1 year ago
2,882 stars

Top 16.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a distributed inference framework for large language models (LLMs) and Stable Diffusion, targeting consumer hardware across mobile, desktop, and server platforms. It aims to democratize AI by enabling users to leverage heterogeneous clusters of devices, including iOS, Android, macOS, Linux, and Windows, to run models that exceed single-device memory capacities.

How It Works

Cake shards transformer blocks across multiple worker devices, minimizing latency through batching of contiguous blocks on the same worker. This approach allows for the execution of large models by distributing their computational and memory load across a network of devices. The framework utilizes Rust and the Candle library, offering flexibility in acceleration through CPU, CUDA, and Metal.

Quick Start & Requirements

  • Installation: Build from source using cargo build --release. Features can be enabled with --features metal or --features cuda. iOS bindings are generated via make ios.
  • Prerequisites: Rust toolchain. CUDA >= 12.2 is required for CUDA acceleration.
  • Running:
    • Worker: cake-cli --model <path> --mode worker --name <name> --topology <file> --address <addr>
    • Master (OpenAI-compatible API): cake-cli --model <path> --api <addr> --topology <file>
  • Resources: Model files need to be managed and potentially split using cake-split-model.
  • Docs: README

Highlighted Details

  • Supports distributed inference for LLMs (e.g., Llama3) and image models (e.g., Stable Diffusion).
  • Enables sharding of models across heterogeneous hardware, including mobile devices.
  • Provides an OpenAI-compatible REST API for easy integration.
  • Includes a utility (cake-split-model) for optimizing model data per worker.

Maintenance & Community

  • Actively developed and experimental.
  • Community support available via Discord (link in README).

Licensing & Compatibility

  • License: GPL 3.0.
  • Compatibility: GPL 3.0 is a strong copyleft license, potentially restricting integration with closed-source applications.

Limitations & Caveats

The project is explicitly experimental and undergoing rapid development, with potential for bugs and breaking changes. CUDA acceleration on Android is listed as untested.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

1.3%
1k
Communication protocol for cost-efficient LLM collaboration
Created 7 months ago
Updated 18 hours ago
Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

dynamo by ai-dynamo

1.0%
5k
Inference framework for distributed generative AI model serving
Created 6 months ago
Updated 15 hours ago
Feedback? Help us improve.