Discover and explore top open-source AI tools and projects—updated daily.
microsoftPositional embedding rescaling for extended LLM context
Top 96.7% on SourcePulse
LongRoPE addresses the critical limitation of fixed context windows in Large Language Models (LLMs), enabling them to process significantly longer sequences of text. Targeted at researchers and developers seeking to enhance LLM capabilities for tasks requiring extensive context, it offers a method to extend context windows to over 2 million tokens, demonstrated effectively in Microsoft's Phi-3 models.
How It Works
The core innovation lies in non-uniformly rescaling Rotary Positional Embeddings (RoPE). LongRoPE employs an efficient search to identify optimal rescaling parameters, facilitating up to an 8x extension without fine-tuning. A progressive extension strategy further boosts capabilities, involving initial fine-tuning to 256k tokens followed by positional interpolation to achieve a 2048k context window. It also includes mechanisms to recover short-context performance by readjusting scaling factors and retained start tokens.
Quick Start & Requirements
Setup involves creating a Conda environment with Python 3.10, activating it, and installing dependencies via requirements.txt. flash-attn requires CUDA version 11.7 or higher. Data tokenization and evaluation scripts are provided within the examples/llama3/ directory. Key resources include official documentation links and example scripts for evolution search and evaluation.
Highlighted Details
Maintenance & Community
The project is authored by researchers from Microsoft, including Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. No specific community channels (e.g., Discord, Slack) or roadmap links were detailed in the provided README snippet.
Licensing & Compatibility
The provided README snippet does not specify a software license. This lack of licensing information presents a significant barrier for potential adopters, particularly for commercial use or integration into closed-source projects.
Limitations & Caveats
Due to policy restrictions, only the evolution search component of LongRoPE is currently released. The README suggests that other LLM training techniques (like EasyContext, nnScaler) are necessary for the fine-tuning stages, implying the repository may not provide a complete end-to-end solution for extending any LLM's context window out-of-the-box.
1 week ago
Inactive
datamllab