petals  by bigscience-workshop

Run LLMs at home, BitTorrent-style

Created 4 years ago
10,152 stars

Top 5.2% on SourcePulse

GitHubView on GitHub
Project Summary

Petals enables users to run and fine-tune large language models (LLMs) like Llama 3.1 (405B) and Mixtral (8x22B) on consumer hardware by distributing model layers across a peer-to-peer network. This approach significantly speeds up inference and fine-tuning compared to traditional offloading methods, making powerful LLMs accessible for desktop users and researchers without high-end infrastructure.

How It Works

Petals utilizes a BitTorrent-like protocol to distribute LLM layers across a decentralized network of participants. When a user runs a model, their device downloads and executes specific layers, then passes the intermediate results to other participants who host subsequent layers. This collaborative execution allows for the inference and fine-tuning of models far larger than what a single machine could handle, with communication managed efficiently to maintain performance.

Quick Start & Requirements

  • Install: pip install git+https://github.com/bigscience-workshop/petals
  • Prerequisites: Python 3.x, PyTorch with CUDA 11.7+ for NVIDIA GPUs (AMD support available via separate instructions). macOS users require Homebrew. WSL is recommended for Windows.
  • Setup: Basic setup is quick, but running larger models may require significant RAM and VRAM.
  • Links: Colab Demo, Wiki, Discord

Highlighted Details

  • Supports inference and fine-tuning for models up to 405B parameters.
  • Achieves up to 6 tokens/sec for Llama 2 (70B) and 4 tokens/sec for Falcon (180B).
  • Offers flexibility with PyTorch and 🤗 Transformers integration for custom model paths and hidden state access.
  • Enables private swarms for sensitive data processing.

Maintenance & Community

Petals is a community-driven project originating from the BigScience research workshop. It has active development and a supportive Discord community.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Performance is dependent on network connectivity and the number of active participants serving model layers. While security measures are in place, users should be aware of the distributed nature of the system when handling highly sensitive data, though private swarms mitigate this.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
73 stars in the last 30 days

Explore Similar Projects

Starred by Matthew Johnson Matthew Johnson(Coauthor of JAX; Research Scientist at Google Brain), Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), and
3 more.

sglang-jax by sgl-project

0.7%
275
High-performance LLM inference engine for JAX/TPU serving
Created 10 months ago
Updated 9 hours ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

0.1%
1k
Communication protocol for cost-efficient LLM collaboration
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.