OpenLLM  by bentoml

SDK for running open-source LLMs as OpenAI-compatible APIs

Created 2 years ago
11,785 stars

Top 4.3% on SourcePulse

GitHubView on GitHub
Project Summary

OpenLLM simplifies the self-hosting of open-source Large Language Models (LLMs) by providing an OpenAI-compatible API endpoint. It targets developers and researchers who need to deploy LLMs efficiently, offering a built-in chat UI and seamless integration with cloud deployment tools like Docker and Kubernetes, ultimately enabling enterprise-grade LLM applications.

How It Works

OpenLLM leverages state-of-the-art inference backends, including vLLM, to achieve high-throughput and low-latency LLM serving. It abstracts away the complexities of model loading, quantization, and API endpoint creation, allowing users to serve various LLMs with a single command. The project utilizes BentoML for packaging and deploying models, ensuring a consistent and production-ready workflow.

Quick Start & Requirements

  • Install via pip: pip install openllm
  • Requires a Hugging Face token (HF_TOKEN) for gated models.
  • Supports various LLMs requiring different GPU memory allocations (e.g., Llama3.1 8B requires 24G).
  • Official docs: https://github.com/bentoml/OpenLLM
  • Interactive exploration: openllm hello

Highlighted Details

  • Serves LLMs as OpenAI-compatible APIs.
  • Supports a wide range of open-source LLMs, including Llama, Mistral, and Qwen.
  • Offers a built-in chat UI and CLI chat functionality.
  • Facilitates deployment to BentoCloud for managed infrastructure.

Maintenance & Community

  • Actively maintained by the BentoML team.
  • Slack community available for support and discussion: https://l.bentoml.com/join-slack
  • Contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

OpenLLM does not store model weights locally; users must ensure access to model files, potentially requiring Hugging Face tokens for gated models. Custom model integration requires packaging models as Bentos using BentoML.

Health Check
Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
0
Star History
105 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
7 more.

dalai by cocktailpeanut

0%
13k
Local LLM inference via CLI tool and Node.js API
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.