OpenLLM  by bentoml

SDK for running open-source LLMs as OpenAI-compatible APIs

created 2 years ago
11,627 stars

Top 4.4% on sourcepulse

GitHubView on GitHub
Project Summary

OpenLLM simplifies the self-hosting of open-source Large Language Models (LLMs) by providing an OpenAI-compatible API endpoint. It targets developers and researchers who need to deploy LLMs efficiently, offering a built-in chat UI and seamless integration with cloud deployment tools like Docker and Kubernetes, ultimately enabling enterprise-grade LLM applications.

How It Works

OpenLLM leverages state-of-the-art inference backends, including vLLM, to achieve high-throughput and low-latency LLM serving. It abstracts away the complexities of model loading, quantization, and API endpoint creation, allowing users to serve various LLMs with a single command. The project utilizes BentoML for packaging and deploying models, ensuring a consistent and production-ready workflow.

Quick Start & Requirements

  • Install via pip: pip install openllm
  • Requires a Hugging Face token (HF_TOKEN) for gated models.
  • Supports various LLMs requiring different GPU memory allocations (e.g., Llama3.1 8B requires 24G).
  • Official docs: https://github.com/bentoml/OpenLLM
  • Interactive exploration: openllm hello

Highlighted Details

  • Serves LLMs as OpenAI-compatible APIs.
  • Supports a wide range of open-source LLMs, including Llama, Mistral, and Qwen.
  • Offers a built-in chat UI and CLI chat functionality.
  • Facilitates deployment to BentoCloud for managed infrastructure.

Maintenance & Community

  • Actively maintained by the BentoML team.
  • Slack community available for support and discussion: https://l.bentoml.com/join-slack
  • Contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

OpenLLM does not store model weights locally; users must ensure access to model files, potentially requiring Hugging Face tokens for gated models. Custom model integration requires packaging models as Bentos using BentoML.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
463 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.