gpt_server by shell-nlp

Open-source framework for production AI model serving

Created 2 years ago

253 stars

Top 99.4% on SourcePulse

Project Summary

Summary

gpt_server is an open-source framework for production-grade deployment of diverse AI models, including LLMs, Embedding, Reranker, ASR, TTS, and image generation/editing. It offers a unified OpenAI-compatible API, simplifying integration and enabling efficient serving across multiple high-performance inference backends. This provides a robust, flexible, and scalable solution for deploying various AI capabilities.

How It Works

gpt_server abstracts complex model serving behind a familiar OpenAI API. It supports multiple inference backends like vLLM, SGLang, and LMDeploy, allowing users to select optimal engines. This multi-backend approach, coupled with dynamic batching for embeddings/rerankers, optimizes throughput and latency. The framework automatically schedules requests to appropriate models and backends, simplifying the deployment of diverse AI services under a single endpoint.

Quick Start & Requirements

Installation is managed via uv (recommended) or conda. After environment setup, copy and modify config_example.yaml for model configurations. Services launch via CLI (uv run gpt_server/serving/main.py or sh gpt_server/script/start.sh) or Docker. Docker images are available on Docker Hub. A Streamlit UI exists but is noted as unstable and deprecated. Official quick-start guides and configuration docs are linked.

Highlighted Details

Unified OpenAI API endpoint, compatible with existing OpenAI-integrated projects.
Supports LLMs, VLMs, Embedding, Reranker, ASR, TTS (voice cloning), Text Moderation, and Stable Diffusion (image generation/editing).
Extensive inference backend support: vLLM, SGLang, LMDeploy, Hugging Face Transformers, with performance favoring LMDeploy TurboMind, SGLang, and vLLM.
Optimized Embedding/Reranker performance via Infinity backend, faster than ONNX/TensorRT with dynamic batching.

Maintenance & Community

The project actively tracks model additions and backend support, indicating ongoing development. While specific community links (Discord, Slack) are not provided, users are encouraged to report issues and contribute.

Licensing & Compatibility

A license shield is present, but the URL is missing, preventing definitive license identification. Users must verify terms for commercial use or closed-source integration.

Limitations & Caveats

The visual UI (server_ui.py) is explicitly marked as unstable, buggy, and deprecated; users should rely on the API or CLI. Support for certain models/backends may be experimental and require further testing.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days