LLMRouter by ulab-uiuc

Optimize LLM inference with intelligent routing

Created 4 months ago

1,410 stars

Top 28.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

LLMRouter provides an intelligent, open-source system for dynamically routing queries to the most suitable Large Language Model (LLM), optimizing inference for cost and performance. It targets researchers and developers seeking to manage complex LLM deployments efficiently. The library offers a unified command-line interface (CLI) and a comprehensive data generation pipeline, simplifying the process of training, deploying, and managing diverse LLM routing strategies.

How It Works

LLMRouter employs a sophisticated approach to smart routing, automatically selecting optimal LLMs based on task complexity, cost constraints, and performance requirements. It supports over 16 distinct routing models, categorized into single-round, multi-round, agentic, and personalized routers. These models encompass a wide array of techniques, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Multi-Layer Perceptrons (MLP), Matrix Factorization, Elo Rating, graph-based methods, and BERT-based routing, offering flexibility for various use cases.

Quick Start & Requirements

Installation is available via PyPI (pip install llmrouter-lib) or from source for editable installs. Source installation requires Python 3.10 and potentially specific versions of PyTorch (e.g., 2.4.0) and vLLM (e.g., 0.6.3) for optional features like router-r1. A crucial requirement for inference, chat, and data generation is setting the API_KEYS environment variable with valid LLM API keys. Configuration is managed through YAML files, allowing per-model or router-level API endpoint specification.

Highlighted Details

Supports 16+ diverse routing models across four categories: single-round, multi-round, agentic, and personalized.
Features a unified CLI for training, inference, and interactive chat via a Gradio-based UI.
Includes a complete data generation pipeline utilizing 11 benchmark datasets for creating training data with embeddings and performance metrics.
Offers an extensible plugin system for easily adding custom router implementations and task definitions.

Maintenance & Community

The project is presented as a "living, extensible research framework" actively welcoming community contributions, including new routing strategies and training paradigms. While specific community channels like Discord or Slack are not detailed, the repository encourages pull requests for integration. Several research papers are acknowledged as inspirations for the router implementations.

Licensing & Compatibility

The provided README does not explicitly state the software license. This omission requires further investigation before adoption, particularly concerning commercial use or integration into closed-source projects.

Limitations & Caveats

Future development areas identified in the TODO list include improving personalized routers, integrating multimodal routing capabilities, and adding continual/online learning for routers. The necessity of configuring API keys for most functionalities is a practical consideration for deployment.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

243 stars in the last 30 days