petals  by bigscience-workshop

Run LLMs at home, BitTorrent-style

created 3 years ago
9,733 stars

Top 5.2% on sourcepulse

GitHubView on GitHub
Project Summary

Petals enables users to run and fine-tune large language models (LLMs) like Llama 3.1 (405B) and Mixtral (8x22B) on consumer hardware by distributing model layers across a peer-to-peer network. This approach significantly speeds up inference and fine-tuning compared to traditional offloading methods, making powerful LLMs accessible for desktop users and researchers without high-end infrastructure.

How It Works

Petals utilizes a BitTorrent-like protocol to distribute LLM layers across a decentralized network of participants. When a user runs a model, their device downloads and executes specific layers, then passes the intermediate results to other participants who host subsequent layers. This collaborative execution allows for the inference and fine-tuning of models far larger than what a single machine could handle, with communication managed efficiently to maintain performance.

Quick Start & Requirements

  • Install: pip install git+https://github.com/bigscience-workshop/petals
  • Prerequisites: Python 3.x, PyTorch with CUDA 11.7+ for NVIDIA GPUs (AMD support available via separate instructions). macOS users require Homebrew. WSL is recommended for Windows.
  • Setup: Basic setup is quick, but running larger models may require significant RAM and VRAM.
  • Links: Colab Demo, Wiki, Discord

Highlighted Details

  • Supports inference and fine-tuning for models up to 405B parameters.
  • Achieves up to 6 tokens/sec for Llama 2 (70B) and 4 tokens/sec for Falcon (180B).
  • Offers flexibility with PyTorch and 🤗 Transformers integration for custom model paths and hidden state access.
  • Enables private swarms for sensitive data processing.

Maintenance & Community

Petals is a community-driven project originating from the BigScience research workshop. It has active development and a supportive Discord community.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Performance is dependent on network connectivity and the number of active participants serving model layers. While security measures are in place, users should be aware of the distributed nature of the system when handling highly sensitive data, though private swarms mitigate this.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
153 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.