modelz-llm by tensorchord

Inference server for open-source LLMs, offering an OpenAI-compatible API

Created 2 years ago

275 stars

Top 94.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

Modelz LLM provides an OpenAI-compatible API for serving various open-source large language models, including LLaMA, Vicuna, and ChatGLM. It targets developers and researchers who want to easily deploy and interact with these models in local or cloud environments using familiar tools like the OpenAI Python SDK or LangChain.

How It Works

Modelz LLM acts as an inference server, abstracting the complexities of loading and running different LLMs. It leverages the Mosec inference engine and FastChat for prompt generation, offering a unified interface to diverse models. This approach allows users to switch between models seamlessly without altering their application code, benefiting from a consistent API.

Quick Start & Requirements

Install: pip install modelz-llm or pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]
Run: modelz-llm -m bigscience/bloomz-560m --device cpu
Prerequisites: GPU recommended for most models (e.g., Nvidia L4, A100, T4). CPU is supported for smaller models.
Docs: https://github.com/tensorchord/modelz-llm

Highlighted Details

OpenAI compatible API for /completions, /chat/completions, and /embeddings.
Supports models like FastChat T5, Vicuna, LLaMA, and ChatGLM.
Cloud-native deployment options via Docker images for Kubernetes.
Integrates directly with LangChain and OpenAI Python SDK.

Maintenance & Community

Developed by TensorChord.
Acknowledgements to FastChat and Mosec.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Designed for use with OpenAI SDK and LangChain, suggesting broad compatibility with Python-based LLM applications.

Limitations & Caveats

The specific license is not detailed, which may impact commercial use. The README lists recommended GPUs for specific models, implying that performance or even functionality might be constrained on less powerful hardware.

modelz-llm by tensorchord

Explore Similar Projects

parllama by paulrobello

simpleAI by lhenault

LlamaPen by ImDarkTom

GenossGPT by theodo-group

LLM.swift by eastriverlee

chatchan-dist by easychen

Alpaca by Jeffser

worker-vllm by runpod-workers

api-for-open-llm by xusenlinzy

inference by xorbitsai

OpenLLM by bentoml

llama-cpp-python by abetlen