basaran  by hyperonym

Open-source API server for text completion

Created 2 years ago
1,297 stars

Top 30.7% on SourcePulse

GitHubView on GitHub
Project Summary

Basaran provides an open-source, OpenAI-compatible API for serving Hugging Face Transformers text generation models, enabling developers to easily swap proprietary LLM services with self-hosted open-source alternatives without code modifications. It targets developers and researchers looking to leverage the latest open-source LLMs in their applications with a familiar API interface and streaming capabilities.

How It Works

Basaran acts as a middleware, translating OpenAI API requests into Hugging Face Transformers model calls. It supports various decoding strategies, handles both decoder-only and encoder-decoder architectures, and includes a robust detokenizer. Key advantages include its OpenAI API compatibility, enabling seamless integration with existing tools and libraries, and its support for multi-GPU deployment and quantization for performance optimization.

Quick Start & Requirements

  • Install/Run: docker run -p 80:80 -e MODEL=user/repo hyperonym/basaran:X.Y.Z (replace X.Y.Z with the latest version).
  • Prerequisites: Docker, NVIDIA Driver and NVIDIA Container Runtime for GPU acceleration. Python 3.8+ and PyTorch 1.13+ for pip installation.
  • Setup: Docker setup is near-instantaneous. Pip installation requires Python environment setup.
  • Links: Playground: http://127.0.0.1/

Highlighted Details

  • OpenAI API and client library compatibility.
  • Supports streaming generation with various decoding strategies.
  • Handles decoder-only and encoder-decoder models.
  • Offers multi-GPU support with optional quantization.

Maintenance & Community

The project is open-source, with contributions welcomed via issues. Further details on contributing are available in CONTRIBUTING.md.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Basaran currently does not support the model parameter in completions requests (though it's required by OpenAI clients, any string will work). The chat API is noted as difficult to unify due to varying model-specific chat history formats, recommending pre-formatting prompts for the completion API.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.