Discover and explore top open-source AI tools and projects—updated daily.
Intelligent LLM routing for efficient inference
New!
Top 32.5% on SourcePulse
This project provides an intelligent Mixture-of-Models (MoM) router designed to enhance LLM inference efficiency and accuracy by directing requests to the most suitable model from a pool based on semantic understanding. It targets developers and researchers seeking to optimize LLM deployments, offering benefits like improved accuracy, reduced latency, and enhanced security through features like PII detection and prompt guarding.
How It Works
The router employs BERT classification to semantically understand the intent, complexity, and task requirements of incoming requests. This allows it to intelligently select the most appropriate LLM from a configured set, akin to how Mixture-of-Experts (MoE) operates within a single model but applied at the model selection level. This approach leverages specialized models for specific tasks, leading to better overall inference accuracy and efficiency. It also supports automatic tool selection based on prompt relevance and includes similarity caching for prompt representations to reduce token usage and latency.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README indicates that benchmarking between the Golang and Python implementations is planned, suggesting performance characteristics are not yet finalized. Specific version requirements for prerequisites and detailed resource estimations for setup and operation are not provided.
21 hours ago
Inactive