rerope by bojone

Position embeddings research paper

Created 2 years ago

386 stars

Top 74.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Coauthor of SGLang, vLLM

Project Summary

This repository introduces Rectified Rotary Position Embeddings (ReRoPE), a method to extend the context length of Large Language Models (LLMs) without requiring fine-tuning. It is targeted at LLM researchers and practitioners seeking to improve model performance on longer sequences. ReRoPE offers a way to achieve lower loss with increased context length, outperforming standard RoPE and NTK-RoPE in benchmarks.

How It Works

ReRoPE modifies the original Rotary Position Embeddings (RoPE) by introducing a "rectification" mechanism. This approach aims to preserve the benefits of RoPE while mitigating the performance degradation observed when extending context length, particularly the "longer context, lower loss" property. The implementation details are available in linked blog posts and code modifications.

Quick Start & Requirements

Install: pip install transformers==4.31.0
Run: python test.py for chatting, python eval_loss.py for loss evaluation.
Dependencies: Transformers library (v4.31.0).
Links: Blog Post 1, Blog Post 2, English Blog Post 1, English Blog Post 2

Highlighted Details

Achieves lower loss at extended context lengths (e.g., 8k, 16k) compared to original RoPE and NTK-RoPE.
Demonstrates minimal performance degradation at the original training length (4k).
Offers a "longer context, lower loss" property.
Includes a Triton implementation for potential performance gains.

Maintenance & Community

Primary contributor: Jianlin Su.
Communication: QQ group 67729435.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Designed for use with the Hugging Face Transformers library.

Limitations & Caveats

The README does not explicitly state the license, which may impact commercial use. While the method is presented as an alternative to fine-tuning, its effectiveness across all LLM architectures and tasks is not detailed.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days