WebGPU inference engine for RWKV language model
Top 87.1% on sourcepulse
This project provides a pure WebGPU/Rust inference engine for the RWKV language model, targeting developers and researchers who need efficient, cross-platform LLM execution without Python or CUDA dependencies. It enables running RWKV models on a wide range of hardware, including integrated GPUs, and offers features like batched inference and quantization for performance.
How It Works
Web-RWKV leverages WebGPU for GPU acceleration, allowing it to run on Nvidia, AMD, and Intel GPUs, as well as in browsers via WASM. Its core design focuses on efficient inference through features like batched processing and quantization (INT8, Float4). The engine provides essential components like a tokenizer, model loading, state management, and GPU-accelerated forward passes, with hooks for advanced customization.
Quick Start & Requirements
cargo build --release
and then cargo run --release --example chat
Highlighted Details
Maintenance & Community
The project is maintained by cryscan. Further community links or roadmaps are not explicitly detailed in the README.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. The logo is inspired by a design licensed for non-commercial use.
Limitations & Caveats
This is an inference engine only; it does not provide sampling methods or API servers, though companion projects are mentioned. Debugging Rust on Windows may require specific toolchain configurations for optimal LLDB support.
3 weeks ago
1 week