yarn  by jquesnelle

Context window extension method for LLMs (research paper, models)

created 2 years ago
1,542 stars

Top 27.5% on sourcepulse

GitHubView on GitHub
Project Summary

YaRN provides an efficient method for extending the context window of Large Language Models (LLMs), addressing the limitations of fixed context lengths in processing long documents or conversations. It is targeted at researchers and developers working with LLMs who need to improve their models' ability to handle extended inputs.

How It Works

YaRN modifies the attention mechanism by adjusting the positional embeddings. It uses a combination of NTK-aware scaling and a linear decay function to interpolate positional information, allowing models to generalize to longer sequences than they were originally trained on. This approach aims to maintain performance while significantly increasing the effective context window.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires Python and standard ML libraries. Specific hardware requirements (e.g., GPU, VRAM) will depend on the model size and context length being used.
  • Training requires DeepSpeed acceleration.
  • Evaluation requires lm-evaluation-harness.
  • Links: Paper, Models on Hugging Face

Highlighted Details

  • Fine-tuned models available for Llama 2, Mistral, and SOLAR up to 128K context length.
  • Code and data are published for result reproduction.
  • Training utilizes DeepSpeed Zero 3.
  • Evaluation scripts are provided.

Maintenance & Community

The project is associated with the ICLR 2024 paper "YaRN: Efficient Context Window Extension of Large Language Models." Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The project's models are released under the Llama 2 license. Compatibility for commercial use or closed-source linking would be subject to the terms of the Llama 2 license.

Limitations & Caveats

The README focuses on reproduction and fine-tuned models, with less detail on using the core YaRN method for extending arbitrary existing models. Performance at extreme context lengths may vary.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
76 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.