llm-router  by NVIDIA-AI-Blueprints

LLM and VLM request router for optimal model selection

Created 1 year ago
275 stars

Top 94.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This experimental blueprint provides an automated system for routing LLM/VLM requests to the best model, balancing accuracy, speed, and cost. It analyzes text and multimodal prompts to identify optimal models, targeting AI Engineers, Developers, MLOps, and Research Teams to enhance AI agent system efficiency.

How It Works

Built on NVIDIA NeMo Agent Toolkit (FastAPI), it offers two routing strategies: intent-based classification (Qwen 1.7B) and auto-routing (CLIP embeddings + neural network). This approach dynamically selects models based on prompt content, including images, automating complex trade-offs for optimized inference performance and resource utilization.

Quick Start & Requirements

Installation uses Docker Compose. Prerequisites include Linux (Ubuntu 22.04+ recommended) or macOS, Docker, and Docker Compose. For local development, Python 3.12+ and the uv package manager are needed. Essential API keys are NVIDIA Build API and Azure OpenAI/OpenAI. GPU is required for the CLIP server and training the auto-router; inference can run on CPU. Clone the repository from https://github.com/NVIDIA-AI-Blueprints/llm-router to begin.

Highlighted Details

  • Multimodal Support: Routes requests using both text and images.
  • Dual Routing Strategies: Offers intent-based classification or neural network auto-routing.
  • OpenAI API Compliant: Exposes an OpenAI-compatible chat completions endpoint for model recommendations.
  • Flexible Configuration: Supports pre-defined intents or custom neural network training.

Maintenance & Community

The README lacks specific details on maintenance, contributors, sponsorships, or community channels. As an experimental branch, formal roadmap or active community engagement information is not provided.

Licensing & Compatibility

The blueprint's license is not explicitly stated. It relies on third-party open-source components, requiring users to review their respective licenses. This ambiguity may impact commercial use or integration into closed-source projects.

Limitations & Caveats

This is an experimental v2 branch, incompatible with v1. It only returns model names, not proxying requests, requiring the caller to handle inference. Production deployment demands significant end-user responsibility for security, API key management, logging, monitoring, and component updates.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.