Rust framework for distributed LLM/SD inference across diverse hardware
Top 16.9% on sourcepulse
This project provides a distributed inference framework for large language models (LLMs) and Stable Diffusion, targeting consumer hardware across mobile, desktop, and server platforms. It aims to democratize AI by enabling users to leverage heterogeneous clusters of devices, including iOS, Android, macOS, Linux, and Windows, to run models that exceed single-device memory capacities.
How It Works
Cake shards transformer blocks across multiple worker devices, minimizing latency through batching of contiguous blocks on the same worker. This approach allows for the execution of large models by distributing their computational and memory load across a network of devices. The framework utilizes Rust and the Candle library, offering flexibility in acceleration through CPU, CUDA, and Metal.
Quick Start & Requirements
cargo build --release
. Features can be enabled with --features metal
or --features cuda
. iOS bindings are generated via make ios
.cake-cli --model <path> --mode worker --name <name> --topology <file> --address <addr>
cake-cli --model <path> --api <addr> --topology <file>
cake-split-model
.Highlighted Details
cake-split-model
) for optimizing model data per worker.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is explicitly experimental and undergoing rapid development, with potential for bugs and breaking changes. CUDA acceleration on Android is listed as untested.
9 months ago
1 day