RouteLLM by lm-sys

Framework for LLM routing and cost reduction (research paper)

Created 1 year ago

4,526 stars

Top 10.8% on SourcePulse

View on GitHub

12 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

Gabriel Almeida

Cofounder of Langflow

and 8 more!

Project Summary

RouteLLM is a framework for serving and evaluating LLM routers, designed to reduce LLM operational costs without sacrificing response quality. It targets developers and researchers seeking to optimize LLM deployments by intelligently routing queries to different models based on complexity and cost thresholds. The primary benefit is significant cost savings (up to 85%) while maintaining high performance comparable to top-tier models.

How It Works

RouteLLM employs a two-model routing strategy: a powerful, expensive model and a cheaper, less capable model. It uses a "router" component that analyzes incoming prompts and determines the optimal model for the task based on a configurable cost threshold. This threshold balances cost savings against response quality. The framework supports various routing algorithms, including matrix factorization, Elo ranking, and BERT classifiers, trained on preference data.

Quick Start & Requirements

Installation: pip install "routellm[serve,eval]" or from source.
Prerequisites: OpenAI API key (for embeddings), and API keys for chosen LLM providers (e.g., Anyscale, Ollama). Python 3.8+ recommended.
Resources: Requires API access to specified LLMs. Setup is generally quick, involving setting environment variables for API keys.
Links: Blog, Paper

Highlighted Details

Drop-in replacement for OpenAI client or an OpenAI-compatible server.
Pre-trained routers achieve up to 85% cost reduction with 95% GPT-4 quality on benchmarks like MT Bench.
Supports routing to a wide range of models via LiteLLM integration.
Includes an evaluation framework for comparing router performance across benchmarks (MMLU, GSM8K, MT-Bench).

Maintenance & Community

The project is associated with LMSYS Org, known for its work on LLM evaluation and benchmarks. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

The repository is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

While pre-trained routers generalize well, optimal performance may require retraining or threshold calibration on specific query datasets. The framework currently focuses on routing between two models.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

73 stars in the last 30 days