RWKV-LM  by BlinkDL

RNN for LLM, transformer-level performance, parallelizable training

Created 4 years ago
14,095 stars

Top 3.5% on SourcePulse

GitHubView on GitHub
Project Summary

RWKV-LM is an open-source project offering a novel RNN architecture that achieves Transformer-level Large Language Model (LLM) performance with significant efficiency gains. It targets researchers and developers looking for fast training, linear inference time, and constant memory usage, making it suitable for LLM and multimodal applications.

How It Works

RWKV combines the parallelizable training of Transformers with the efficient inference of RNNs. It is entirely attention-free, relying on a time-decay mechanism where each token's state is computed based on the previous state. This design avoids the quadratic complexity of attention mechanisms, enabling linear time and constant space complexity during inference, even with "infinite" context lengths. RWKV-7 introduces meta-in-context learning via in-context gradient descent.

Quick Start & Requirements

  • Install: pip install rwkv
  • Prerequisites: Python 3.10+, PyTorch 2.5+, CUDA 12.5+, DeepSpeed, Wandb, Ninja (for training). Specific versions are recommended for older RWKV versions (e.g., deepspeed==0.7.0, pytorch-lightning==1.9.5, torch==1.13.1+cu117).
  • Demos: Hugging Face Spaces for RWKV-7 0.1B and RWKV-6 7B are available. WebGPU demo also provided.
  • Training: Requires significant data (e.g., MiniPile, Pile) and compute resources.

Highlighted Details

  • Achieves Transformer-level LLM performance with RNN efficiency.
  • Linear time and constant space complexity during inference (no KV cache).
  • Fast training and "infinite" context length capabilities.
  • RWKV-7 supports meta-in-context learning via in-context gradient descent.

Maintenance & Community

  • A Linux Foundation AI project.
  • Active community with 9k+ members on Discord.
  • Regular updates and new versions (RWKV-7 "Goose" is the latest).
  • Numerous community projects leveraging RWKV.

Licensing & Compatibility

  • Licensed under the permissive MIT license.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Training from scratch requires substantial computational resources and expertise.
  • While RWKV-7 is noted as stable, older versions (RWKV-6) had reported training spikes, with suggested workarounds.
  • Some advanced features or optimizations might still be in development (WIP).
Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
2
Star History
114 stars in the last 30 days

Explore Similar Projects

Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Alex Yu Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI), and
7 more.

ChatRWKV by BlinkDL

0.0%
10k
Open-source chatbot powered by the RWKV RNN language model
Created 2 years ago
Updated 1 month ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 3 weeks ago
Feedback? Help us improve.