Discover and explore top open-source AI tools and projects—updated daily.
NVIDIA-AI-BlueprintsLLM and VLM request router for optimal model selection
Top 94.0% on SourcePulse
Summary
This experimental blueprint provides an automated system for routing LLM/VLM requests to the best model, balancing accuracy, speed, and cost. It analyzes text and multimodal prompts to identify optimal models, targeting AI Engineers, Developers, MLOps, and Research Teams to enhance AI agent system efficiency.
How It Works
Built on NVIDIA NeMo Agent Toolkit (FastAPI), it offers two routing strategies: intent-based classification (Qwen 1.7B) and auto-routing (CLIP embeddings + neural network). This approach dynamically selects models based on prompt content, including images, automating complex trade-offs for optimized inference performance and resource utilization.
Quick Start & Requirements
Installation uses Docker Compose. Prerequisites include Linux (Ubuntu 22.04+ recommended) or macOS, Docker, and Docker Compose. For local development, Python 3.12+ and the uv package manager are needed. Essential API keys are NVIDIA Build API and Azure OpenAI/OpenAI. GPU is required for the CLIP server and training the auto-router; inference can run on CPU. Clone the repository from https://github.com/NVIDIA-AI-Blueprints/llm-router to begin.
Highlighted Details
Maintenance & Community
The README lacks specific details on maintenance, contributors, sponsorships, or community channels. As an experimental branch, formal roadmap or active community engagement information is not provided.
Licensing & Compatibility
The blueprint's license is not explicitly stated. It relies on third-party open-source components, requiring users to review their respective licenses. This ambiguity may impact commercial use or integration into closed-source projects.
Limitations & Caveats
This is an experimental v2 branch, incompatible with v1. It only returns model names, not proxying requests, requiring the caller to handle inference. Production deployment demands significant end-user responsibility for security, API key management, logging, monitoring, and component updates.
2 weeks ago
Inactive
aurelio-labs
huggingface