rerope  by bojone

Position embeddings research paper

created 2 years ago
375 stars

Top 76.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository introduces Rectified Rotary Position Embeddings (ReRoPE), a method to extend the context length of Large Language Models (LLMs) without requiring fine-tuning. It is targeted at LLM researchers and practitioners seeking to improve model performance on longer sequences. ReRoPE offers a way to achieve lower loss with increased context length, outperforming standard RoPE and NTK-RoPE in benchmarks.

How It Works

ReRoPE modifies the original Rotary Position Embeddings (RoPE) by introducing a "rectification" mechanism. This approach aims to preserve the benefits of RoPE while mitigating the performance degradation observed when extending context length, particularly the "longer context, lower loss" property. The implementation details are available in linked blog posts and code modifications.

Quick Start & Requirements

Highlighted Details

  • Achieves lower loss at extended context lengths (e.g., 8k, 16k) compared to original RoPE and NTK-RoPE.
  • Demonstrates minimal performance degradation at the original training length (4k).
  • Offers a "longer context, lower loss" property.
  • Includes a Triton implementation for potential performance gains.

Maintenance & Community

  • Primary contributor: Jianlin Su.
  • Communication: QQ group 67729435.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for use with the Hugging Face Transformers library.

Limitations & Caveats

The README does not explicitly state the license, which may impact commercial use. While the method is presented as an alternative to fine-tuning, its effectiveness across all LLM architectures and tasks is not detailed.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.