RWKV-LM by BlinkDL

RNN for LLM, transformer-level performance, parallelizable training

Created 4 years ago

14,277 stars

Top 3.5% on SourcePulse

View on GitHub

34 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Daniel Gross

Cofounder of Safe Superintelligence

Nat Friedman

Former CEO of GitHub

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

and 30 more!

Project Summary

RWKV-LM is an open-source project offering a novel RNN architecture that achieves Transformer-level Large Language Model (LLM) performance with significant efficiency gains. It targets researchers and developers looking for fast training, linear inference time, and constant memory usage, making it suitable for LLM and multimodal applications.

How It Works

RWKV combines the parallelizable training of Transformers with the efficient inference of RNNs. It is entirely attention-free, relying on a time-decay mechanism where each token's state is computed based on the previous state. This design avoids the quadratic complexity of attention mechanisms, enabling linear time and constant space complexity during inference, even with "infinite" context lengths. RWKV-7 introduces meta-in-context learning via in-context gradient descent.

Quick Start & Requirements

Install: pip install rwkv
Prerequisites: Python 3.10+, PyTorch 2.5+, CUDA 12.5+, DeepSpeed, Wandb, Ninja (for training). Specific versions are recommended for older RWKV versions (e.g., deepspeed==0.7.0, pytorch-lightning==1.9.5, torch==1.13.1+cu117).
Demos: Hugging Face Spaces for RWKV-7 0.1B and RWKV-6 7B are available. WebGPU demo also provided.
Training: Requires significant data (e.g., MiniPile, Pile) and compute resources.

Highlighted Details

Achieves Transformer-level LLM performance with RNN efficiency.
Linear time and constant space complexity during inference (no KV cache).
Fast training and "infinite" context length capabilities.
RWKV-7 supports meta-in-context learning via in-context gradient descent.

Maintenance & Community

A Linux Foundation AI project.
Active community with 9k+ members on Discord.
Regular updates and new versions (RWKV-7 "Goose" is the latest).
Numerous community projects leveraging RWKV.

Licensing & Compatibility

Licensed under the permissive MIT license.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Training from scratch requires substantial computational resources and expertise.
While RWKV-7 is noted as stable, older versions (RWKV-6) had reported training spikes, with suggested workarounds.
Some advanced features or optimizations might still be in development (WIP).

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

80 stars in the last 30 days