exo  by exo-explore

AI cluster for running models on diverse devices

created 1 year ago
29,133 stars

Top 1.3% on sourcepulse

GitHubView on GitHub
Project Summary

exo enables users to create distributed AI inference clusters using everyday devices, including smartphones, Macs, and Raspberry Pis. It targets individuals and businesses looking to run large language models locally, offering a unified, peer-to-peer approach to distributed computing without a master-worker architecture. The primary benefit is leveraging existing hardware for powerful AI inference, with a ChatGPT-compatible API for easy integration.

How It Works

exo employs dynamic model partitioning, intelligently splitting AI models across available devices based on network topology and individual device resources. This allows for the execution of models larger than any single device could handle. The system uses automatic device discovery and a peer-to-peer (P2P) networking model, ensuring any connected device can contribute to the cluster. The default partitioning strategy is ring memory weighted partitioning, where each device processes layers proportional to its memory capacity.

Quick Start & Requirements

  • Installation: Install from source: git clone https://github.com/exo-explore/exo.git && cd exo && pip install -e . or source install.sh.
  • Prerequisites: Python >= 3.12.0. For Linux with NVIDIA GPU: NVIDIA driver, CUDA toolkit, cuDNN library.
  • Hardware: Sufficient total memory across all devices to fit the model (e.g., 16GB for Llama 3.1 8B fp16). Supports heterogeneous devices (GPU, CPU).
  • Docs: Example Usage on Multiple macOS Devices

Highlighted Details

  • Supports various models: LLaMA (MLX, tinygrad), Mistral, LLaVA, Qwen, Deepseek.
  • ChatGPT-compatible API for seamless application integration.
  • Peer-to-peer (P2P) device connectivity, avoiding master-worker bottlenecks.
  • Dynamic model partitioning optimizes resource utilization across heterogeneous hardware.

Maintenance & Community

  • Maintained by exo labs.
  • Community channels: Discord, Telegram, X.
  • Actively hiring and seeking business partnerships.

Licensing & Compatibility

  • License: GPL-3.0.
  • Compatibility: GPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under GPL-3.0. Commercial use or linking with closed-source applications may require careful consideration or a separate license.

Limitations & Caveats

The project is experimental software with expected bugs. The iOS implementation is currently behind and requires manual access requests. PyTorch and Radio/Bluetooth discovery modules are listed as under development.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
4
Star History
1,359 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Feedback? Help us improve.