minions  by HazyResearch

Communication protocol for cost-efficient LLM collaboration

created 5 months ago
1,093 stars

Top 35.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a demonstration of the Minions protocol, enabling collaboration between on-device and cloud Large Language Models (LLMs) to reduce cloud costs with minimal quality degradation. It's designed for researchers and developers looking to optimize LLM inference by offloading complex tasks to powerful cloud models while handling simpler queries or context processing locally.

How It Works

Minions facilitates a tiered LLM architecture where a local, smaller model processes initial context or queries, and a larger, cloud-based model handles more complex reasoning or generation. This approach aims to minimize the amount of data sent to the cloud, thereby reducing latency and cost, by intelligently routing tasks based on complexity or context length. The protocol supports both single-model (Minion) and multi-model (Minions) interactions, allowing for flexible orchestration.

Quick Start & Requirements

  • Install: pip install -e . (or pip install -e ".[mlx]" for MLX support).
  • Local Server: Requires Ollama (for non-NVIDIA GPUs) or Tokasaurus (for NVIDIA GPUs).
  • Cloud API Key: OpenAI, TogetherAI, DeepSeek, OpenRouter, or Perplexity API key is needed.
  • Python: Tested on Python 3.10-3.11.
  • Optional: MLX (Apple Silicon), llama-cpp-python (GPU/CPU acceleration).
  • Demo: Run streamlit run app.py after setting API keys and local model server.
  • Docs: Paper, Blogpost

Highlighted Details

  • Supports multiple local and remote LLM providers including Ollama, OpenAI, TogetherAI, and Azure OpenAI.
  • Includes an inference estimator utility to benchmark LLM performance on local hardware.
  • Offers a demo application and CLI for easy testing of the Minions protocol.
  • Enables structured output generation via Pydantic models for local clients.

Maintenance & Community

  • Maintained by Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Stanford.
  • Links to paper and blog post are provided for further information.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Python 3.13 is not supported. The licensing is not explicitly stated, which may impact commercial adoption. The setup requires installing and configuring separate local LLM servers (Ollama or Tokasaurus).

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
0
Star History
376 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.