RouteLLM  by lm-sys

Framework for LLM routing and cost reduction (research paper)

Created 1 year ago
4,276 stars

Top 11.5% on SourcePulse

GitHubView on GitHub
Project Summary

RouteLLM is a framework for serving and evaluating LLM routers, designed to reduce LLM operational costs without sacrificing response quality. It targets developers and researchers seeking to optimize LLM deployments by intelligently routing queries to different models based on complexity and cost thresholds. The primary benefit is significant cost savings (up to 85%) while maintaining high performance comparable to top-tier models.

How It Works

RouteLLM employs a two-model routing strategy: a powerful, expensive model and a cheaper, less capable model. It uses a "router" component that analyzes incoming prompts and determines the optimal model for the task based on a configurable cost threshold. This threshold balances cost savings against response quality. The framework supports various routing algorithms, including matrix factorization, Elo ranking, and BERT classifiers, trained on preference data.

Quick Start & Requirements

  • Installation: pip install "routellm[serve,eval]" or from source.
  • Prerequisites: OpenAI API key (for embeddings), and API keys for chosen LLM providers (e.g., Anyscale, Ollama). Python 3.8+ recommended.
  • Resources: Requires API access to specified LLMs. Setup is generally quick, involving setting environment variables for API keys.
  • Links: Blog, Paper

Highlighted Details

  • Drop-in replacement for OpenAI client or an OpenAI-compatible server.
  • Pre-trained routers achieve up to 85% cost reduction with 95% GPT-4 quality on benchmarks like MT Bench.
  • Supports routing to a wide range of models via LiteLLM integration.
  • Includes an evaluation framework for comparing router performance across benchmarks (MMLU, GSM8K, MT-Bench).

Maintenance & Community

The project is associated with LMSYS Org, known for its work on LLM evaluation and benchmarks. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

The repository is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

While pre-trained routers generalize well, optimal performance may require retraining or threshold calibration on specific query datasets. The framework currently focuses on routing between two models.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
68 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

MobiLlama by mbzuai-oryx

0%
660
Small language model for edge devices
Created 1 year ago
Updated 4 months ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

1.3%
1k
Communication protocol for cost-efficient LLM collaboration
Created 7 months ago
Updated 16 hours ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

llm-numbers by ray-project

0%
4k
LLM developer's reference for key numbers
Created 2 years ago
Updated 1 year ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.