llm-router by NVIDIA-AI-Blueprints

LLM and VLM request router for optimal model selection

Created 1 year ago

319 stars

Top 84.7% on SourcePulse

Project Summary

Summary

This experimental blueprint provides an automated system for routing LLM/VLM requests to the best model, balancing accuracy, speed, and cost. It analyzes text and multimodal prompts to identify optimal models, targeting AI Engineers, Developers, MLOps, and Research Teams to enhance AI agent system efficiency.

How It Works

Built on NVIDIA NeMo Agent Toolkit (FastAPI), it offers two routing strategies: intent-based classification (Qwen 1.7B) and auto-routing (CLIP embeddings + neural network). This approach dynamically selects models based on prompt content, including images, automating complex trade-offs for optimized inference performance and resource utilization.

Quick Start & Requirements

Installation uses Docker Compose. Prerequisites include Linux (Ubuntu 22.04+ recommended) or macOS, Docker, and Docker Compose. For local development, Python 3.12+ and the uv package manager are needed. Essential API keys are NVIDIA Build API and Azure OpenAI/OpenAI. GPU is required for the CLIP server and training the auto-router; inference can run on CPU. Clone the repository from https://github.com/NVIDIA-AI-Blueprints/llm-router to begin.

Highlighted Details

Multimodal Support: Routes requests using both text and images.
Dual Routing Strategies: Offers intent-based classification or neural network auto-routing.
OpenAI API Compliant: Exposes an OpenAI-compatible chat completions endpoint for model recommendations.
Flexible Configuration: Supports pre-defined intents or custom neural network training.

Maintenance & Community

The README lacks specific details on maintenance, contributors, sponsorships, or community channels. As an experimental branch, formal roadmap or active community engagement information is not provided.

Licensing & Compatibility

The blueprint's license is not explicitly stated. It relies on third-party open-source components, requiring users to review their respective licenses. This ambiguity may impact commercial use or integration into closed-source projects.

Limitations & Caveats

This is an experimental v2 branch, incompatible with v1. It only returns model names, not proxying requests, requiring the caller to handle inference. Production deployment demands significant end-user responsibility for security, API key management, logging, monitoring, and component updates.

llm-router by NVIDIA-AI-Blueprints

Explore Similar Projects

RLVR-World by thuml

IntelliNode by intelligentnode

free-router by bytonylee

NadirClaw by NadirRouter

deep-reinforcement-learning-gym by lilianweng

done-hub by deanxv

AISuperDomain by win4r

semantic-router by aurelio-labs

FlagAI by FlagAI-Open

OpenJarvis by open-jarvis

gallery by google-ai-edge

transformers by huggingface