ai00_server  by Ai00-X

RWKV inference API server with OpenAI-compatible interface

created 2 years ago
573 stars

Top 57.1% on sourcepulse

GitHubView on GitHub
Project Summary

AI00 RWKV Server provides a compact, all-in-one inference API for RWKV language models, targeting users who need an efficient and easy-to-deploy LLM solution. It offers OpenAI-compatible API endpoints, enabling chatbots, text generation, and Q&A applications without requiring heavy dependencies like PyTorch or CUDA.

How It Works

The server leverages the web-rwkv inference engine and utilizes Vulkan for parallel and concurrent batched inference. This approach allows it to run on any GPU supporting Vulkan, including AMD and integrated graphics, democratizing GPU acceleration beyond NVIDIA hardware. Its Rust implementation contributes to its compact size and performance.

Quick Start & Requirements

  • Install: Download pre-built executables from the Releases page.
  • Prerequisites: Rust (for building from source), Python with torch and safetensors (for model conversion). Models require .st extension.
  • Setup: Download model, place in assets/models/, modify assets/configs/Config.toml, and run ./ai00_rwkv_server. Access WebUI at http://localhost:65530.
  • Docs: API Docs

Highlighted Details

  • Vulkan-accelerated inference on non-NVIDIA GPUs.
  • OpenAI API compatibility for seamless integration.
  • Supports RWKV models V5, V6, and V7.
  • Unique BNF sampling for structured output generation.

Maintenance & Community

The project is actively maintained with a growing community. Users can join via Discord or a QQ group (30920262).

Licensing & Compatibility

Licensed under MIT/Apache-2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

Currently supports Safetensors models (.st); .pth models require conversion. Hot loading/switching of LoRA models is a planned feature.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 20 hours ago
Feedback? Help us improve.