yarn by jquesnelle

Context window extension method for LLMs (research paper, models)

Created 2 years ago

1,656 stars

Top 25.3% on SourcePulse

View on GitHub

9 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Simon Willison

Coauthor of Django

and 5 more!

Project Summary

YaRN provides an efficient method for extending the context window of Large Language Models (LLMs), addressing the limitations of fixed context lengths in processing long documents or conversations. It is targeted at researchers and developers working with LLMs who need to improve their models' ability to handle extended inputs.

How It Works

YaRN modifies the attention mechanism by adjusting the positional embeddings. It uses a combination of NTK-aware scaling and a linear decay function to interpolate positional information, allowing models to generalize to longer sequences than they were originally trained on. This approach aims to maintain performance while significantly increasing the effective context window.

Quick Start & Requirements

Install via pip install -e . after cloning the repository.
Requires Python and standard ML libraries. Specific hardware requirements (e.g., GPU, VRAM) will depend on the model size and context length being used.
Training requires DeepSpeed acceleration.
Evaluation requires lm-evaluation-harness.
Links: Paper, Models on Hugging Face

Highlighted Details

Fine-tuned models available for Llama 2, Mistral, and SOLAR up to 128K context length.
Code and data are published for result reproduction.
Training utilizes DeepSpeed Zero 3.
Evaluation scripts are provided.

Maintenance & Community

The project is associated with the ICLR 2024 paper "YaRN: Efficient Context Window Extension of Large Language Models." Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The project's models are released under the Llama 2 license. Compatibility for commercial use or closed-source linking would be subject to the terms of the Llama 2 license.

Limitations & Caveats

The README focuses on reproduction and fine-tuned models, with less detail on using the core YaRN method for extending arbitrary existing models. Performance at extreme context lengths may vary.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days