modelz-llm  by tensorchord

Inference server for open-source LLMs, offering an OpenAI-compatible API

created 2 years ago
275 stars

Top 94.9% on sourcepulse

GitHubView on GitHub
Project Summary

Modelz LLM provides an OpenAI-compatible API for serving various open-source large language models, including LLaMA, Vicuna, and ChatGLM. It targets developers and researchers who want to easily deploy and interact with these models in local or cloud environments using familiar tools like the OpenAI Python SDK or LangChain.

How It Works

Modelz LLM acts as an inference server, abstracting the complexities of loading and running different LLMs. It leverages the Mosec inference engine and FastChat for prompt generation, offering a unified interface to diverse models. This approach allows users to switch between models seamlessly without altering their application code, benefiting from a consistent API.

Quick Start & Requirements

  • Install: pip install modelz-llm or pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]
  • Run: modelz-llm -m bigscience/bloomz-560m --device cpu
  • Prerequisites: GPU recommended for most models (e.g., Nvidia L4, A100, T4). CPU is supported for smaller models.
  • Docs: https://github.com/tensorchord/modelz-llm

Highlighted Details

  • OpenAI compatible API for /completions, /chat/completions, and /embeddings.
  • Supports models like FastChat T5, Vicuna, LLaMA, and ChatGLM.
  • Cloud-native deployment options via Docker images for Kubernetes.
  • Integrates directly with LangChain and OpenAI Python SDK.

Maintenance & Community

  • Developed by TensorChord.
  • Acknowledgements to FastChat and Mosec.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for use with OpenAI SDK and LangChain, suggesting broad compatibility with Python-based LLM applications.

Limitations & Caveats

The specific license is not detailed, which may impact commercial use. The README lists recommended GPUs for specific models, implying that performance or even functionality might be constrained on less powerful hardware.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.