web-rwkv  by cryscan

WebGPU inference engine for RWKV language model

created 2 years ago
314 stars

Top 87.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pure WebGPU/Rust inference engine for the RWKV language model, targeting developers and researchers who need efficient, cross-platform LLM execution without Python or CUDA dependencies. It enables running RWKV models on a wide range of hardware, including integrated GPUs, and offers features like batched inference and quantization for performance.

How It Works

Web-RWKV leverages WebGPU for GPU acceleration, allowing it to run on Nvidia, AMD, and Intel GPUs, as well as in browsers via WASM. Its core design focuses on efficient inference through features like batched processing and quantization (INT8, Float4). The engine provides essential components like a tokenizer, model loading, state management, and GPU-accelerated forward passes, with hooks for advanced customization.

Quick Start & Requirements

  • Install/Run: cargo build --release and then cargo run --release --example chat
  • Prerequisites: Rust toolchain, downloaded RWKV models (safetensors format).
  • Setup: Model conversion script provided.
  • Docs: Examples are available within the repository.

Highlighted Details

  • No CUDA or Python dependencies.
  • Supports Nvidia, AMD, Intel GPUs (including integrated).
  • WASM support for browser execution.
  • Batched inference and INT8/Float4 quantization.
  • Advanced customization via inference hooks.

Maintenance & Community

The project is maintained by cryscan. Further community links or roadmaps are not explicitly detailed in the README.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. The logo is inspired by a design licensed for non-commercial use.

Limitations & Caveats

This is an inference engine only; it does not provide sampling methods or API servers, though companion projects are mentioned. Debugging Rust on Windows may require specific toolchain configurations for optimal LLDB support.

Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

gemma_pytorch by google

0.1%
6k
PyTorch implementation for Google's Gemma models
created 1 year ago
updated 2 months ago
Feedback? Help us improve.