llm-router  by anyscale

LLM router framework for optimized responses

Created 1 year ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project introduces a framework for training LLM routers that dynamically route queries to either high-quality closed-source LLMs or cost-effective open-source LLMs. It targets developers building LLM-powered applications who need to balance response quality with operational costs. The primary benefit is significant cost reduction (up to 70%) on benchmarks like MT Bench, while maintaining high response quality.

How It Works

The core approach involves training a causal LLM classifier, specifically fine-tuning Llama3-8B, to predict the quality of a potential response from a cost-effective model (Mixtral-8x7B) relative to a high-quality reference (GPT-4). Queries are routed to Mixtral-8x7B if the predicted quality score is high (>=4), and to GPT-4 otherwise. Data labeling is performed using an LLM-as-a-judge methodology, where GPT-4 evaluates Mixtral's responses against its own, assigning a 1-5 star rating. This method allows for scalable, high-quality synthetic data generation.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: ANYSCALE_API_KEY, OPENAI_API_KEY, LLAMA2_HF_TOKEN (for evaluation). GPU resources are recommended for training (e.g., 8xA10 GPUs).
  • Estimated Setup: Approximately 120 minutes, including training time on a multi-GPU node.
  • Links: Evaluation framework available at https://github.com/lm-sys/RouteLLM/.

Highlighted Details

  • Achieves substantial cost reductions: up to 70% on MT Bench, 30% on MMLU, and 40% on GSM8K, while matching baseline performance.
  • Employs LLM-as-a-judge for generating response quality labels, using GPT-4 as both the reference and evaluator.
  • Fine-tunes Llama3-8B as the causal LLM classifier, demonstrating superior routing performance.
  • Evaluated on standard benchmarks including MT Bench, MMLU, and GSM8K, showing improved cost-performance trade-offs over random routing.

Maintenance & Community

This project is developed in collaboration with the Berkeley LMSys group. The primary community and evaluation resources are found within the lm-sys/RouteLLM GitHub repository. No specific community channels (e.g., Discord, Slack) are mentioned in the provided text.

Licensing & Compatibility

The license for the anyscale/llm-router code itself is not explicitly stated in the README. Usage is dependent on API access and terms of service for Anyscale, OpenAI (GPT-4), and Hugging Face (Llama3-8B). Commercial use may be restricted by the licensing of these underlying models and services.

Limitations & Caveats

The framework requires API keys for Anyscale and OpenAI, and access to Llama3-8B for evaluation. The tutorial focuses on a specific routing strategy (causal LLM classifier) and a defined set of models (GPT-4, Mixtral-8x7B, Llama3-8B), which may not be universally applicable. The license for the router code is unspecified.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Nir Gazit Nir Gazit(Cofounder of Traceloop), Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

haven by redotvideo

0.3%
348
LLM fine-tuning and evaluation platform
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

LLMRouter by ulab-uiuc

2.2%
2k
Optimize LLM inference with intelligent routing
Created 6 months ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
10 more.

RouteLLM by lm-sys

0.5%
5k
Framework for LLM routing and cost reduction (research paper)
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.