web-rwkv by cryscan

WebGPU inference engine for RWKV language model

Created 2 years ago

334 stars

Top 82.2% on SourcePulse

Project Summary

This project provides a pure WebGPU/Rust inference engine for the RWKV language model, targeting developers and researchers who need efficient, cross-platform LLM execution without Python or CUDA dependencies. It enables running RWKV models on a wide range of hardware, including integrated GPUs, and offers features like batched inference and quantization for performance.

How It Works

Web-RWKV leverages WebGPU for GPU acceleration, allowing it to run on Nvidia, AMD, and Intel GPUs, as well as in browsers via WASM. Its core design focuses on efficient inference through features like batched processing and quantization (INT8, Float4). The engine provides essential components like a tokenizer, model loading, state management, and GPU-accelerated forward passes, with hooks for advanced customization.

Quick Start & Requirements

Install/Run: cargo build --release and then cargo run --release --example chat
Prerequisites: Rust toolchain, downloaded RWKV models (safetensors format).
Setup: Model conversion script provided.
Docs: Examples are available within the repository.

Highlighted Details

No CUDA or Python dependencies.
Supports Nvidia, AMD, Intel GPUs (including integrated).
WASM support for browser execution.
Batched inference and INT8/Float4 quantization.
Advanced customization via inference hooks.

Maintenance & Community

The project is maintained by cryscan. Further community links or roadmaps are not explicitly detailed in the README.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. The logo is inspired by a design licensed for non-commercial use.

Limitations & Caveats

This is an inference engine only; it does not provide sampling methods or API servers, though companion projects are mentioned. Debugging Rust on Windows may require specific toolchain configurations for optimal LLDB support.

web-rwkv by cryscan

Explore Similar Projects

inferflow by inferflow

ScaleLLM by vectorch-ai

crabml by crabml

candle-vllm by EricLBuehler

ai00_server by Ai00-X

ZhiLight by zhihu

rwkv.cpp by RWKV

GPTQModel by ModelCloud

entropix by xjdr-alt

exllamav2 by turboderp-org

mistral.rs by EricLBuehler

gemma_pytorch by google