RWKV inference API server with OpenAI-compatible interface
Top 57.1% on sourcepulse
AI00 RWKV Server provides a compact, all-in-one inference API for RWKV language models, targeting users who need an efficient and easy-to-deploy LLM solution. It offers OpenAI-compatible API endpoints, enabling chatbots, text generation, and Q&A applications without requiring heavy dependencies like PyTorch or CUDA.
How It Works
The server leverages the web-rwkv
inference engine and utilizes Vulkan for parallel and concurrent batched inference. This approach allows it to run on any GPU supporting Vulkan, including AMD and integrated graphics, democratizing GPU acceleration beyond NVIDIA hardware. Its Rust implementation contributes to its compact size and performance.
Quick Start & Requirements
torch
and safetensors
(for model conversion). Models require .st
extension.assets/models/
, modify assets/configs/Config.toml
, and run ./ai00_rwkv_server
. Access WebUI at http://localhost:65530
.Highlighted Details
Maintenance & Community
The project is actively maintained with a growing community. Users can join via Discord or a QQ group (30920262).
Licensing & Compatibility
Licensed under MIT/Apache-2.0, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
Currently supports Safetensors models (.st
); .pth
models require conversion. Hot loading/switching of LoRA models is a planned feature.
1 month ago
1 day