minions by HazyResearch

Communication protocol for cost-efficient LLM collaboration

Created 11 months ago

1,241 stars

Top 31.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Johannes Hagemann

Cofounder of Prime Intellect

Piero Molino

Cofounder of Predibase

Jeffrey Morgan

Cofounder of Ollama

and 1 more!

Project Summary

This repository provides a demonstration of the Minions protocol, enabling collaboration between on-device and cloud Large Language Models (LLMs) to reduce cloud costs with minimal quality degradation. It's designed for researchers and developers looking to optimize LLM inference by offloading complex tasks to powerful cloud models while handling simpler queries or context processing locally.

How It Works

Minions facilitates a tiered LLM architecture where a local, smaller model processes initial context or queries, and a larger, cloud-based model handles more complex reasoning or generation. This approach aims to minimize the amount of data sent to the cloud, thereby reducing latency and cost, by intelligently routing tasks based on complexity or context length. The protocol supports both single-model (Minion) and multi-model (Minions) interactions, allowing for flexible orchestration.

Quick Start & Requirements

Install: pip install -e . (or pip install -e ".[mlx]" for MLX support).
Local Server: Requires Ollama (for non-NVIDIA GPUs) or Tokasaurus (for NVIDIA GPUs).
Cloud API Key: OpenAI, TogetherAI, DeepSeek, OpenRouter, or Perplexity API key is needed.
Python: Tested on Python 3.10-3.11.
Optional: MLX (Apple Silicon), llama-cpp-python (GPU/CPU acceleration).
Demo: Run streamlit run app.py after setting API keys and local model server.
Docs: Paper, Blogpost

Highlighted Details

Supports multiple local and remote LLM providers including Ollama, OpenAI, TogetherAI, and Azure OpenAI.
Includes an inference estimator utility to benchmark LLM performance on local hardware.
Offers a demo application and CLI for easy testing of the Minions protocol.
Enables structured output generation via Pydantic models for local clients.

Maintenance & Community

Maintained by Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Stanford.
Links to paper and blog post are provided for further information.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Python 3.13 is not supported. The licensing is not explicitly stated, which may impact commercial adoption. The setup requires installing and configuring separate local LLM servers (Ollama or Tokasaurus).

Health Check

Last Commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days