basaran  by hyperonym

Open-source API server for text completion

created 2 years ago
1,299 stars

Top 31.4% on sourcepulse

GitHubView on GitHub
Project Summary

Basaran provides an open-source, OpenAI-compatible API for serving Hugging Face Transformers text generation models, enabling developers to easily swap proprietary LLM services with self-hosted open-source alternatives without code modifications. It targets developers and researchers looking to leverage the latest open-source LLMs in their applications with a familiar API interface and streaming capabilities.

How It Works

Basaran acts as a middleware, translating OpenAI API requests into Hugging Face Transformers model calls. It supports various decoding strategies, handles both decoder-only and encoder-decoder architectures, and includes a robust detokenizer. Key advantages include its OpenAI API compatibility, enabling seamless integration with existing tools and libraries, and its support for multi-GPU deployment and quantization for performance optimization.

Quick Start & Requirements

  • Install/Run: docker run -p 80:80 -e MODEL=user/repo hyperonym/basaran:X.Y.Z (replace X.Y.Z with the latest version).
  • Prerequisites: Docker, NVIDIA Driver and NVIDIA Container Runtime for GPU acceleration. Python 3.8+ and PyTorch 1.13+ for pip installation.
  • Setup: Docker setup is near-instantaneous. Pip installation requires Python environment setup.
  • Links: Playground: http://127.0.0.1/

Highlighted Details

  • OpenAI API and client library compatibility.
  • Supports streaming generation with various decoding strategies.
  • Handles decoder-only and encoder-decoder models.
  • Offers multi-GPU support with optional quantization.

Maintenance & Community

The project is open-source, with contributions welcomed via issues. Further details on contributing are available in CONTRIBUTING.md.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Basaran currently does not support the model parameter in completions requests (though it's required by OpenAI clients, any string will work). The chat API is noted as difficult to unify due to varying model-specific chat history formats, recommending pre-formatting prompts for the completion API.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 3 days ago
Feedback? Help us improve.