llm-router by anyscale

LLM router framework for optimized responses

Created 2 years ago

254 stars

Top 99.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Philipp Moritz

Cofounder of Anyscale

Project Summary

This project introduces a framework for training LLM routers that dynamically route queries to either high-quality closed-source LLMs or cost-effective open-source LLMs. It targets developers building LLM-powered applications who need to balance response quality with operational costs. The primary benefit is significant cost reduction (up to 70%) on benchmarks like MT Bench, while maintaining high response quality.

How It Works

The core approach involves training a causal LLM classifier, specifically fine-tuning Llama3-8B, to predict the quality of a potential response from a cost-effective model (Mixtral-8x7B) relative to a high-quality reference (GPT-4). Queries are routed to Mixtral-8x7B if the predicted quality score is high (>=4), and to GPT-4 otherwise. Data labeling is performed using an LLM-as-a-judge methodology, where GPT-4 evaluates Mixtral's responses against its own, assigning a 1-5 star rating. This method allows for scalable, high-quality synthetic data generation.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: ANYSCALE_API_KEY, OPENAI_API_KEY, LLAMA2_HF_TOKEN (for evaluation). GPU resources are recommended for training (e.g., 8xA10 GPUs).
Estimated Setup: Approximately 120 minutes, including training time on a multi-GPU node.
Links: Evaluation framework available at https://github.com/lm-sys/RouteLLM/.

Highlighted Details

Achieves substantial cost reductions: up to 70% on MT Bench, 30% on MMLU, and 40% on GSM8K, while matching baseline performance.
Employs LLM-as-a-judge for generating response quality labels, using GPT-4 as both the reference and evaluator.
Fine-tunes Llama3-8B as the causal LLM classifier, demonstrating superior routing performance.
Evaluated on standard benchmarks including MT Bench, MMLU, and GSM8K, showing improved cost-performance trade-offs over random routing.

Maintenance & Community

This project is developed in collaboration with the Berkeley LMSys group. The primary community and evaluation resources are found within the lm-sys/RouteLLM GitHub repository. No specific community channels (e.g., Discord, Slack) are mentioned in the provided text.

Licensing & Compatibility

The license for the anyscale/llm-router code itself is not explicitly stated in the README. Usage is dependent on API access and terms of service for Anyscale, OpenAI (GPT-4), and Hugging Face (Llama3-8B). Commercial use may be restricted by the licensing of these underlying models and services.

Limitations & Caveats

The framework requires API keys for Anyscale and OpenAI, and access to Llama3-8B for evaluation. The tutorial focuses on a specific routing strategy (causal LLM classifier) and a defined set of models (GPT-4, Mixtral-8x7B, Llama3-8B), which may not be universally applicable. The license for the router code is unspecified.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days