RWKV-LM  by BlinkDL

RNN for LLM, transformer-level performance, parallelizable training

created 4 years ago
13,862 stars

Top 3.7% on sourcepulse

GitHubView on GitHub
Project Summary

RWKV-LM is an open-source project offering a novel RNN architecture that achieves Transformer-level Large Language Model (LLM) performance with significant efficiency gains. It targets researchers and developers looking for fast training, linear inference time, and constant memory usage, making it suitable for LLM and multimodal applications.

How It Works

RWKV combines the parallelizable training of Transformers with the efficient inference of RNNs. It is entirely attention-free, relying on a time-decay mechanism where each token's state is computed based on the previous state. This design avoids the quadratic complexity of attention mechanisms, enabling linear time and constant space complexity during inference, even with "infinite" context lengths. RWKV-7 introduces meta-in-context learning via in-context gradient descent.

Quick Start & Requirements

  • Install: pip install rwkv
  • Prerequisites: Python 3.10+, PyTorch 2.5+, CUDA 12.5+, DeepSpeed, Wandb, Ninja (for training). Specific versions are recommended for older RWKV versions (e.g., deepspeed==0.7.0, pytorch-lightning==1.9.5, torch==1.13.1+cu117).
  • Demos: Hugging Face Spaces for RWKV-7 0.1B and RWKV-6 7B are available. WebGPU demo also provided.
  • Training: Requires significant data (e.g., MiniPile, Pile) and compute resources.

Highlighted Details

  • Achieves Transformer-level LLM performance with RNN efficiency.
  • Linear time and constant space complexity during inference (no KV cache).
  • Fast training and "infinite" context length capabilities.
  • RWKV-7 supports meta-in-context learning via in-context gradient descent.

Maintenance & Community

  • A Linux Foundation AI project.
  • Active community with 9k+ members on Discord.
  • Regular updates and new versions (RWKV-7 "Goose" is the latest).
  • Numerous community projects leveraging RWKV.

Licensing & Compatibility

  • Licensed under the permissive MIT license.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Training from scratch requires substantial computational resources and expertise.
  • While RWKV-7 is noted as stable, older versions (RWKV-6) had reported training spikes, with suggested workarounds.
  • Some advanced features or optimizations might still be in development (WIP).
Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
3
Star History
347 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.