Long-Context by abacusai

LLM context expansion via RoPE encoding modifications

Created 2 years ago

598 stars

Top 54.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This repository provides code, tooling, and experimental results for extending the context length of Large Language Models (LLMs), specifically Llama. It targets researchers and practitioners aiming to improve LLM performance on tasks requiring long-range information retrieval and understanding. The primary benefit is enabling LLMs to process and reason over significantly larger input contexts than their original pre-training limits.

How It Works

The project explores various methods to extend LLM context length, focusing on modifications to Rotary Position Embeddings (RoPE). Key approaches include linear scaling of RoPE, scaling the Fourier basis of RoPE, applying truncation to the Fourier basis, and randomizing position vectors. These techniques are combined with fine-tuning on datasets like RedPajama and instruction-tuning with Vicuna. Linear scaling, particularly when combined with instruction fine-tuning (IFT), emerged as the most robust method, achieving non-zero accuracy up to 20k context lengths.

Quick Start & Requirements

Install: Code is provided for fine-tuning and evaluation. Specific commands depend on the chosen experiment.
Prerequisites: Python, PyTorch, Hugging Face Transformers, and potentially CUDA for GPU acceleration.
Resources: Fine-tuning and evaluation on long contexts will require significant GPU memory and compute.
Links:
- Evaluation scripts: run_inference_WikiQA.py
- Datasets: HuggingFace

Highlighted Details

Linear scaling with IFT shows robustness for context lengths up to 16k, with potential for 20-24k.
Evaluation methodologies significantly impact the ranking of different context extension approaches.
Instruction fine-tuning improves retrieval accuracy but does not fundamentally extend the model's inherent context handling limits.
Custom datasets (WikiQA FFQA and AltQA) are provided for evaluating long-context retrieval and robustness against memorization.

Maintenance & Community

The project is from Abacus.AI.
Further details on community engagement or roadmap are not explicitly provided in the README.

Licensing & Compatibility

The repository's code is likely under a permissive license (e.g., MIT, Apache 2.0), but specific licensing for shared model weights or datasets should be verified.
Compatibility for commercial use depends on the underlying Llama model license and any specific terms for shared weights.

Limitations & Caveats

While linear scaling shows promise, it doesn't perfectly extrapolate to the theoretical maximum context length (e.g., scale 16 doesn't reach 32k).
Some explored methods, like xPos, showed convergence issues, potentially due to precision limitations or fundamental differences from base RoPE.
The effectiveness of different methods can vary significantly based on the evaluation task.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days