Inference server for open-source LLMs, offering an OpenAI-compatible API
Top 94.9% on sourcepulse
Modelz LLM provides an OpenAI-compatible API for serving various open-source large language models, including LLaMA, Vicuna, and ChatGLM. It targets developers and researchers who want to easily deploy and interact with these models in local or cloud environments using familiar tools like the OpenAI Python SDK or LangChain.
How It Works
Modelz LLM acts as an inference server, abstracting the complexities of loading and running different LLMs. It leverages the Mosec inference engine and FastChat for prompt generation, offering a unified interface to diverse models. This approach allows users to switch between models seamlessly without altering their application code, benefiting from a consistent API.
Quick Start & Requirements
pip install modelz-llm
or pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]
modelz-llm -m bigscience/bloomz-560m --device cpu
Highlighted Details
/completions
, /chat/completions
, and /embeddings
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The specific license is not detailed, which may impact commercial use. The README lists recommended GPUs for specific models, implying that performance or even functionality might be constrained on less powerful hardware.
1 year ago
1 day