yarn  by jquesnelle

Context window extension method for LLMs (research paper, models)

Created 2 years ago
1,607 stars

Top 26.1% on SourcePulse

GitHubView on GitHub
Project Summary

YaRN provides an efficient method for extending the context window of Large Language Models (LLMs), addressing the limitations of fixed context lengths in processing long documents or conversations. It is targeted at researchers and developers working with LLMs who need to improve their models' ability to handle extended inputs.

How It Works

YaRN modifies the attention mechanism by adjusting the positional embeddings. It uses a combination of NTK-aware scaling and a linear decay function to interpolate positional information, allowing models to generalize to longer sequences than they were originally trained on. This approach aims to maintain performance while significantly increasing the effective context window.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires Python and standard ML libraries. Specific hardware requirements (e.g., GPU, VRAM) will depend on the model size and context length being used.
  • Training requires DeepSpeed acceleration.
  • Evaluation requires lm-evaluation-harness.
  • Links: Paper, Models on Hugging Face

Highlighted Details

  • Fine-tuned models available for Llama 2, Mistral, and SOLAR up to 128K context length.
  • Code and data are published for result reproduction.
  • Training utilizes DeepSpeed Zero 3.
  • Evaluation scripts are provided.

Maintenance & Community

The project is associated with the ICLR 2024 paper "YaRN: Efficient Context Window Extension of Large Language Models." Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The project's models are released under the Llama 2 license. Compatibility for commercial use or closed-source linking would be subject to the terms of the Llama 2 license.

Limitations & Caveats

The README focuses on reproduction and fine-tuned models, with less detail on using the core YaRN method for extending arbitrary existing models. Performance at extreme context lengths may vary.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Binyuan Hui Binyuan Hui(Research Scientist at Alibaba Qwen), and
3 more.

xgen by salesforce

0.1%
723
LLM research release with 8k sequence length
Created 2 years ago
Updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Starred by Phil Wang Phil Wang(Prolific Research Paper Implementer), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
6 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
Created 2 months ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.