Gemma 2B with 10M context length using Infini-attention
Top 39.6% on sourcepulse
This repository provides a Gemma 2B language model fine-tuned to achieve a 10 million token context length using an Infini-attention mechanism. It targets researchers and developers needing to process extremely long sequences with limited hardware, offering significant memory savings over standard attention mechanisms.
How It Works
The core innovation is Infini-attention, which addresses the quadratic memory growth of standard multi-head attention's KV cache. By splitting attention into local blocks and applying recurrence to these blocks, it achieves a linear memory complexity (O(N)) while enabling global attention over a 10M token context. This approach draws inspiration from Transformer-XL.
Quick Start & Requirements
pip install -r requirements.txt
python main.py
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This is a very early checkpoint with only 200 training steps, indicating potential limitations in model performance and robustness. The long-term maintenance and community support are not yet established.
1 year ago
Inactive