Inference engine for deploying open-source models
Top 36.7% on sourcepulse
This project provides an inference engine for deploying open-source AI models on OpenRouter.ai, targeting developers and researchers who want to host and serve models efficiently. It aims to make model deployment faster and cheaper, with incentives for performance improvements.
How It Works
The OpenRouter Runner is a monolithic inference engine built on the Modal platform. It leverages Modal's serverless infrastructure to deploy and manage various open-source models. The core design involves defining model-specific containers (e.g., using vLLM) and registering them within the runner's configuration. This approach allows for scalable and on-demand inference, abstracting away much of the underlying infrastructure management.
Quick Start & Requirements
poetry install
, poetry shell
, modal token new
, modal environment create dev
, modal config set-environment dev
, modal secret create ...
, modal run runner::download
, modal deploy runner
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Creating new containers for models with unique requirements can be complex. The project does not explicitly state its license, which may impact commercial adoption.
6 months ago
Inactive