LongRoPE by microsoft

Positional embedding rescaling for extended LLM context

Created 1 year ago

276 stars

Top 93.9% on SourcePulse

Project Summary

LongRoPE addresses the critical limitation of fixed context windows in Large Language Models (LLMs), enabling them to process significantly longer sequences of text. Targeted at researchers and developers seeking to enhance LLM capabilities for tasks requiring extensive context, it offers a method to extend context windows to over 2 million tokens, demonstrated effectively in Microsoft's Phi-3 models.

How It Works

The core innovation lies in non-uniformly rescaling Rotary Positional Embeddings (RoPE). LongRoPE employs an efficient search to identify optimal rescaling parameters, facilitating up to an 8x extension without fine-tuning. A progressive extension strategy further boosts capabilities, involving initial fine-tuning to 256k tokens followed by positional interpolation to achieve a 2048k context window. It also includes mechanisms to recover short-context performance by readjusting scaling factors and retained start tokens.

Quick Start & Requirements

Setup involves creating a Conda environment with Python 3.10, activating it, and installing dependencies via requirements.txt. flash-attn requires CUDA version 11.7 or higher. Data tokenization and evaluation scripts are provided within the examples/llama3/ directory. Key resources include official documentation links and example scripts for evolution search and evaluation.

Highlighted Details

Accepted at ICML 2024 and integrated into Microsoft's Phi-3 family (mini, small, medium, vision) supporting 128k context windows.
Demonstrates strong performance across various LLMs on long-context code understanding (RepoQA) and standard benchmarks (MMLU, GSM8K), with Phi3-mini-128k achieving 84.5% average on RepoQA at 128k context.
Supports multi-modality long context tasks, exemplified by Phi3-vision 128k-instruct.
Benchmark tables compare LongRoPE's effectiveness against models like Gemini-1.5-pro and GPT-4 across different context lengths and tasks.

Maintenance & Community

The project is authored by researchers from Microsoft, including Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. No specific community channels (e.g., Discord, Slack) or roadmap links were detailed in the provided README snippet.

Licensing & Compatibility

The provided README snippet does not specify a software license. This lack of licensing information presents a significant barrier for potential adopters, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

Due to policy restrictions, only the evolution search component of LongRoPE is currently released. The README suggests that other LLM training techniques (like EasyContext, nnScaler) are necessary for the fine-tuning stages, implying the repository may not provide a complete end-to-end solution for extending any LLM's context window out-of-the-box.

LongRoPE by microsoft

Explore Similar Projects

long-llms-learning by Strivin0311

EM-LLM-model by em-llm

Awesome-Multimodal-Token-Compression by cokeshao

NBCE by bojone

ChunkLlama by HKUNLP

Long-Context by abacusai

Samba by microsoft

LongLM by datamllab

Awesome-LLM-Long-Context-Modeling by Xnhyacinth

long_llama by CStanKonrad

Aria by rhymes-ai

CAG by hhhuang