LongLoRA: Efficient fine-tuning for long-context LLMs
Top 18.0% on sourcepulse
This repository provides LongLoRA, an efficient fine-tuning method for extending the context length of Large Language Models (LLMs). It addresses the challenge of processing long documents by enabling models to handle contexts up to 100k tokens, benefiting researchers and developers working with extensive text data.
How It Works
LongLoRA employs a "shifted short attention" mechanism, which is designed to be compatible with Flash-Attention and requires no modification during inference. This approach allows for efficient fine-tuning of LLMs to significantly longer context windows, reducing computational costs and memory requirements compared to standard attention mechanisms.
Quick Start & Requirements
pip install -r requirements.txt
and pip install flash-attn --no-build-isolation
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The data and model weights are strictly for research use and prohibit commercial applications. Models trained using this dataset must also remain within research boundaries.
11 months ago
1 day