PyTorch implementation of Infini-attention for efficient, infinite context Transformers
Top 76.6% on SourcePulse
This repository provides an unofficial PyTorch implementation of Infini-attention, a technique for enabling Transformers to process extremely long contexts efficiently. It targets researchers and engineers working with large language models like Gemma and Llama, offering a way to significantly extend context windows beyond standard limitations.
How It Works
InfiniTransformer implements two versions of Infini-attention. Type I modifies model and trainer configurations for maximum memory efficiency, enabling training with context lengths up to 1 million tokens on high-end hardware. Type II integrates Infini-attention solely within the attention layer, maintaining compatibility with the Hugging Face Trainer and standard configurations while offering moderate memory savings.
Quick Start & Requirements
pip install -r requirements.txt
and pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers
../train.llama.infini.noclm.1Mseq.sh
).python test_basic.infini.py
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Type I is not compatible with the basic Hugging Face Trainer and requires custom training code. The project is an unofficial implementation, and specific compatibility or stability guarantees are not provided.
1 year ago
Inactive