chatglm-openai-api by ninehills

OpenAI-compatible API for ChatGLM models and embeddings

Created 2 years ago

513 stars

Top 61.0% on SourcePulse

Project Summary

This project provides an OpenAI-compatible API endpoint for the ChatGLM-6B/ChatGLM2-6B large language models and Chinese embedding models. It targets developers and researchers who want to integrate these powerful Chinese NLP models into applications that already use the OpenAI API structure, offering a seamless transition and local deployment option.

How It Works

The project wraps ChatGLM models and Chinese embedding models within a FastAPI web server, exposing endpoints that mimic the OpenAI API. It leverages Hugging Face for model hosting and allows users to specify models, including quantized versions (e.g., chatglm-6b-int4) for reduced VRAM usage. The architecture supports multi-GPU inference and integrates with tunneling services like ngrok or Cloudflare Tunnel for external accessibility.

Quick Start & Requirements

Install: Clone the repository, copy config.toml.example to config.toml, and install dependencies via pip install -r requirements.txt.
Prerequisites: Requires Python 3.x, PyTorch, and CUDA-enabled GPUs. Models are hosted on Hugging Face, necessitating good internet access.
Running: Execute python main.py with arguments like --llm_model (e.g., chatglm2-6b-int4) and --tunnel (e.g., cloudflared). Multi-GPU support is available via CUDA_VISIBLE_DEVICES and the --gpus argument.
Docs: OpenAI API Reference

Highlighted Details

Supports ChatGLM-6B, ChatGLM2-6B, and FreedomIntelligence/phoenix-inst-chat-7b models.
Offers quantized model versions (int4, int8) for lower VRAM consumption.
Integrates with Cloudflare Tunnel for recommended external access.
Compatible with OpenAI API clients and UIs like Chatbot-UI.

Maintenance & Community

The project is actively maintained by ninehills. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would depend on the underlying ChatGLM model licenses and any specific terms set by the project maintainers.

Limitations & Caveats

The project requires a stable internet connection for model downloads from Hugging Face. The ngrok tunnel is noted as being limited for non-paid users. The README implies that multi-GPU support for LLM inference is currently experimental.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days