InfiniTransformer  by Beomi

PyTorch implementation of Infini-attention for efficient, infinite context Transformers

created 1 year ago
367 stars

Top 76.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides an unofficial PyTorch implementation of Infini-attention, a technique for enabling Transformers to process extremely long contexts efficiently. It targets researchers and engineers working with large language models like Gemma and Llama, offering a way to significantly extend context windows beyond standard limitations.

How It Works

InfiniTransformer implements two versions of Infini-attention. Type I modifies model and trainer configurations for maximum memory efficiency, enabling training with context lengths up to 1 million tokens on high-end hardware. Type II integrates Infini-attention solely within the attention layer, maintaining compatibility with the Hugging Face Trainer and standard configurations while offering moderate memory savings.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt and pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers.
  • Requires PyTorch and the specified version of 🤗Transformers.
  • Training examples are provided via shell scripts (e.g., ./train.llama.infini.noclm.1Mseq.sh).
  • Inference and basic tests can be run with python test_basic.infini.py.
  • Official quick-start and examples are available in the repository.

Highlighted Details

  • Type I allows training Gemma-2B with 32K sequence length on 2x H100 80G, or Llama-3-8B with 1M sequence length on 2x H100 80G.
  • Achieves "infinite" context training, demonstrated with a 1M sequence length on 1x H100 80G.
  • Type II offers compatibility with the Hugging Face Trainer.
  • Sample generation and inference examples are provided.

Maintenance & Community

  • The project is unofficial and maintained by Beomi.
  • No specific community channels or roadmap links are provided in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

Type I is not compatible with the basic Hugging Face Trainer and requires custom training code. The project is an unofficial implementation, and specific compatibility or stability guarantees are not provided.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Research Scientist at NVIDIA; Author of LMFlow), and
3 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 1 year ago
Feedback? Help us improve.