RNN for LLM, transformer-level performance, parallelizable training
Top 3.7% on sourcepulse
RWKV-LM is an open-source project offering a novel RNN architecture that achieves Transformer-level Large Language Model (LLM) performance with significant efficiency gains. It targets researchers and developers looking for fast training, linear inference time, and constant memory usage, making it suitable for LLM and multimodal applications.
How It Works
RWKV combines the parallelizable training of Transformers with the efficient inference of RNNs. It is entirely attention-free, relying on a time-decay mechanism where each token's state is computed based on the previous state. This design avoids the quadratic complexity of attention mechanisms, enabling linear time and constant space complexity during inference, even with "infinite" context lengths. RWKV-7 introduces meta-in-context learning via in-context gradient descent.
Quick Start & Requirements
pip install rwkv
deepspeed==0.7.0
, pytorch-lightning==1.9.5
, torch==1.13.1+cu117
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
1 week