gemma-2B-10M by mustafaaljadery

Gemma 2B with 10M context length using Infini-attention

Created 1 year ago

938 stars

Top 39.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Vincent Weisser

Cofounder of Prime Intellect

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides a Gemma 2B language model fine-tuned to achieve a 10 million token context length using an Infini-attention mechanism. It targets researchers and developers needing to process extremely long sequences with limited hardware, offering significant memory savings over standard attention mechanisms.

How It Works

The core innovation is Infini-attention, which addresses the quadratic memory growth of standard multi-head attention's KV cache. By splitting attention into local blocks and applying recurrence to these blocks, it achieves a linear memory complexity (O(N)) while enabling global attention over a 10M token context. This approach draws inspiration from Transformer-XL.

Quick Start & Requirements

Install requirements: pip install -r requirements.txt
Run inference: python main.py
Requires Python and PyTorch.
Model weights are available on Hugging Face.

Highlighted Details

Achieves 10M sequence length on Gemma 2B.
Operates with less than 32GB of memory.
Features native inference optimized for CUDA.

Maintenance & Community

Developed by Mustafa Aljadery, Siddharth Sharma, and Aksh Garg.
Further training is planned.
Technical overview available on Medium.

Licensing & Compatibility

License not specified in the README.
Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

This is a very early checkpoint with only 200 training steps, indicating potential limitations in model performance and robustness. The long-term maintenance and community support are not yet established.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days