semantic-router by vllm-project

Intelligent LLM routing for efficient inference

Created 4 months ago

2,715 stars

Top 17.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Project Summary

This project provides an intelligent Mixture-of-Models (MoM) router designed to enhance LLM inference efficiency and accuracy by directing requests to the most suitable model from a pool based on semantic understanding. It targets developers and researchers seeking to optimize LLM deployments, offering benefits like improved accuracy, reduced latency, and enhanced security through features like PII detection and prompt guarding.

How It Works

The router employs BERT classification to semantically understand the intent, complexity, and task requirements of incoming requests. This allows it to intelligently select the most appropriate LLM from a configured set, akin to how Mixture-of-Experts (MoE) operates within a single model but applied at the model selection level. This approach leverages specialized models for specific tasks, leading to better overall inference accuracy and efficiency. It also supports automatic tool selection based on prompt relevance and includes similarity caching for prompt representations to reduce token usage and latency.

Quick Start & Requirements

Installation: Instructions are available in the Complete Documentation.
Prerequisites: Python and Golang (with Rust FFI via Candle) are mentioned as implementation languages. Specific version requirements are not detailed in the README.
Resources: No specific hardware or resource estimates are provided.

Highlighted Details

Implements an Intelligent Mixture-of-Models (MoM) routing strategy.
Features PII detection and prompt guard capabilities for enhanced security.
Includes similarity caching to improve inference latency and reduce token usage.
Offers both Golang and Python implementations, with benchmarking planned.

Maintenance & Community

The project is associated with the vLLM project.
Community support is available via the #semantic-router channel in the vLLM Slack.
A citation is provided for academic use.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license.
The Apache 2.0 license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The README indicates that benchmarking between the Golang and Python implementations is planned, suggesting performance characteristics are not yet finalized. Specific version requirements for prerequisites and detailed resource estimations for setup and operation are not provided.

Health Check

Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)

143

Issues (30d)

105

Star History

351 stars in the last 30 days