Discover and explore top open-source AI tools and projects—updated daily.
michaelnealeScalable distributed LLM inference across machines
Top 47.5% on SourcePulse
Summary
Mesh-LLM enables distributed inference for Large Language Models (LLMs) by pooling spare GPU capacity across machines. It tackles models too large for single nodes via automatic pipeline or expert parallelism and facilitates agent communication through a decentralized "gossip" protocol. This project targets engineers and researchers aiming to scale LLM deployments and efficiently utilize distributed compute resources.
How It Works
Leveraging a forked llama.cpp, Mesh-LLM automatically distributes LLMs. Dense models use pipeline parallelism, splitting layers across nodes. Mixture-of-Experts (MoE) models employ expert sharding, minimizing cross-node traffic. The system prioritizes low-latency connections, using HTTP streaming for inference to mitigate network overhead. Agents communicate via a decentralized blackboard, forming a "gossip" layer for collaborative workflows.
Quick Start & Requirements
Installation is via a bash script (curl ... | bash) supporting macOS, Linux, and Windows (source/zip). Building from source (git clone ... && just build) is an alternative. Prerequisites include just, cmake, Rust, Node.js (v24+), and GPU toolkits (CUDA, ROCm, Vulkan, Metal; CPU-only supported). Detailed build instructions are in CONTRIBUTING.md.
Highlighted Details
http://localhost:9337/v1).Maintenance & Community
Community discussion occurs on the #mesh-llm channel on the Goose Discord. Development workflows are detailed in CONTRIBUTING.md.
Licensing & Compatibility
The README omits license information, preventing assessment of commercial use or closed-source linking compatibility.
Limitations & Caveats
Pipeline parallelism significantly reduces inference throughput (e.g., 68 tok/s solo vs. 12-13 tok/s on a 3-node split). Cross-network latency impacts time-to-first-token. Advanced features like mesh-wide rebalancing are planned for "Stage Two." The missing license is a critical adoption blocker.
20 hours ago
Inactive
sgl-project
HazyResearch
b4rtaz
bigscience-workshop
ai-dynamo