Communication protocol for cost-efficient LLM collaboration
Top 35.5% on sourcepulse
This repository provides a demonstration of the Minions protocol, enabling collaboration between on-device and cloud Large Language Models (LLMs) to reduce cloud costs with minimal quality degradation. It's designed for researchers and developers looking to optimize LLM inference by offloading complex tasks to powerful cloud models while handling simpler queries or context processing locally.
How It Works
Minions facilitates a tiered LLM architecture where a local, smaller model processes initial context or queries, and a larger, cloud-based model handles more complex reasoning or generation. This approach aims to minimize the amount of data sent to the cloud, thereby reducing latency and cost, by intelligently routing tasks based on complexity or context length. The protocol supports both single-model (Minion) and multi-model (Minions) interactions, allowing for flexible orchestration.
Quick Start & Requirements
pip install -e .
(or pip install -e ".[mlx]"
for MLX support).streamlit run app.py
after setting API keys and local model server.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Python 3.13 is not supported. The licensing is not explicitly stated, which may impact commercial adoption. The setup requires installing and configuring separate local LLM servers (Ollama or Tokasaurus).
1 day ago
1 day