RouteLLM  by lm-sys

Framework for LLM routing and cost reduction (research paper)

created 1 year ago
4,149 stars

Top 12.0% on sourcepulse

GitHubView on GitHub
Project Summary

RouteLLM is a framework for serving and evaluating LLM routers, designed to reduce LLM operational costs without sacrificing response quality. It targets developers and researchers seeking to optimize LLM deployments by intelligently routing queries to different models based on complexity and cost thresholds. The primary benefit is significant cost savings (up to 85%) while maintaining high performance comparable to top-tier models.

How It Works

RouteLLM employs a two-model routing strategy: a powerful, expensive model and a cheaper, less capable model. It uses a "router" component that analyzes incoming prompts and determines the optimal model for the task based on a configurable cost threshold. This threshold balances cost savings against response quality. The framework supports various routing algorithms, including matrix factorization, Elo ranking, and BERT classifiers, trained on preference data.

Quick Start & Requirements

  • Installation: pip install "routellm[serve,eval]" or from source.
  • Prerequisites: OpenAI API key (for embeddings), and API keys for chosen LLM providers (e.g., Anyscale, Ollama). Python 3.8+ recommended.
  • Resources: Requires API access to specified LLMs. Setup is generally quick, involving setting environment variables for API keys.
  • Links: Blog, Paper

Highlighted Details

  • Drop-in replacement for OpenAI client or an OpenAI-compatible server.
  • Pre-trained routers achieve up to 85% cost reduction with 95% GPT-4 quality on benchmarks like MT Bench.
  • Supports routing to a wide range of models via LiteLLM integration.
  • Includes an evaluation framework for comparing router performance across benchmarks (MMLU, GSM8K, MT-Bench).

Maintenance & Community

The project is associated with LMSYS Org, known for its work on LLM evaluation and benchmarks. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

The repository is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

While pre-trained routers generalize well, optimal performance may require retraining or threshold calibration on specific query datasets. The framework currently focuses on routing between two models.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
284 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.