LongRoPE  by microsoft

Positional embedding rescaling for extended LLM context

Created 1 year ago
264 stars

Top 96.7% on SourcePulse

GitHubView on GitHub
Project Summary

LongRoPE addresses the critical limitation of fixed context windows in Large Language Models (LLMs), enabling them to process significantly longer sequences of text. Targeted at researchers and developers seeking to enhance LLM capabilities for tasks requiring extensive context, it offers a method to extend context windows to over 2 million tokens, demonstrated effectively in Microsoft's Phi-3 models.

How It Works

The core innovation lies in non-uniformly rescaling Rotary Positional Embeddings (RoPE). LongRoPE employs an efficient search to identify optimal rescaling parameters, facilitating up to an 8x extension without fine-tuning. A progressive extension strategy further boosts capabilities, involving initial fine-tuning to 256k tokens followed by positional interpolation to achieve a 2048k context window. It also includes mechanisms to recover short-context performance by readjusting scaling factors and retained start tokens.

Quick Start & Requirements

Setup involves creating a Conda environment with Python 3.10, activating it, and installing dependencies via requirements.txt. flash-attn requires CUDA version 11.7 or higher. Data tokenization and evaluation scripts are provided within the examples/llama3/ directory. Key resources include official documentation links and example scripts for evolution search and evaluation.

Highlighted Details

  • Accepted at ICML 2024 and integrated into Microsoft's Phi-3 family (mini, small, medium, vision) supporting 128k context windows.
  • Demonstrates strong performance across various LLMs on long-context code understanding (RepoQA) and standard benchmarks (MMLU, GSM8K), with Phi3-mini-128k achieving 84.5% average on RepoQA at 128k context.
  • Supports multi-modality long context tasks, exemplified by Phi3-vision 128k-instruct.
  • Benchmark tables compare LongRoPE's effectiveness against models like Gemini-1.5-pro and GPT-4 across different context lengths and tasks.

Maintenance & Community

The project is authored by researchers from Microsoft, including Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. No specific community channels (e.g., Discord, Slack) or roadmap links were detailed in the provided README snippet.

Licensing & Compatibility

The provided README snippet does not specify a software license. This lack of licensing information presents a significant barrier for potential adopters, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

Due to policy restrictions, only the evolution search component of LongRoPE is currently released. The README suggests that other LLM training techniques (like EasyContext, nnScaler) are necessary for the fine-tuning stages, implying the repository may not provide a complete end-to-end solution for extending any LLM's context window out-of-the-box.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Luis Capelo Luis Capelo(Cofounder of Lightning AI).

LongLM by datamllab

0%
661
Self-Extend: LLM context window extension via self-attention
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.