OpenLLM by bentoml

SDK for running open-source LLMs as OpenAI-compatible APIs

Created 2 years ago

12,034 stars

Top 4.2% on SourcePulse

View on GitHub

14 Experts Love This Project

Chaoyu Yang

Founder of Bento

Junyang Lin

Core Maintainer at Alibaba Qwen

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Alexey Milovidov

Cofounder of Clickhouse

and 10 more!

Project Summary

OpenLLM simplifies the self-hosting of open-source Large Language Models (LLMs) by providing an OpenAI-compatible API endpoint. It targets developers and researchers who need to deploy LLMs efficiently, offering a built-in chat UI and seamless integration with cloud deployment tools like Docker and Kubernetes, ultimately enabling enterprise-grade LLM applications.

How It Works

OpenLLM leverages state-of-the-art inference backends, including vLLM, to achieve high-throughput and low-latency LLM serving. It abstracts away the complexities of model loading, quantization, and API endpoint creation, allowing users to serve various LLMs with a single command. The project utilizes BentoML for packaging and deploying models, ensuring a consistent and production-ready workflow.

Quick Start & Requirements

Install via pip: pip install openllm
Requires a Hugging Face token (HF_TOKEN) for gated models.
Supports various LLMs requiring different GPU memory allocations (e.g., Llama3.1 8B requires 24G).
Official docs: https://github.com/bentoml/OpenLLM
Interactive exploration: openllm hello

Highlighted Details

Serves LLMs as OpenAI-compatible APIs.
Supports a wide range of open-source LLMs, including Llama, Mistral, and Qwen.
Offers a built-in chat UI and CLI chat functionality.
Facilitates deployment to BentoCloud for managed infrastructure.

Maintenance & Community

Actively maintained by the BentoML team.
Slack community available for support and discussion: https://l.bentoml.com/join-slack
Contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

License: Apache-2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

OpenLLM does not store model weights locally; users must ensure access to model files, potentially requiring Hugging Face tokens for gated models. Custom model integration requires packaging models as Bentos using BentoML.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

50 stars in the last 30 days