LongLoRA by JIA-Lab-research

LongLoRA: Efficient fine-tuning for long-context LLMs

Created 2 years ago

2,697 stars

Top 17.4% on SourcePulse

View on GitHub

6 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Pawel Garbacki

Cofounder of Fireworks AI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Georgios Konstantopoulos

CTO, General Partner at Paradigm

and 2 more!

Project Summary

This repository provides LongLoRA, an efficient fine-tuning method for extending the context length of Large Language Models (LLMs). It addresses the challenge of processing long documents by enabling models to handle contexts up to 100k tokens, benefiting researchers and developers working with extensive text data.

How It Works

LongLoRA employs a "shifted short attention" mechanism, which is designed to be compatible with Flash-Attention and requires no modification during inference. This approach allows for efficient fine-tuning of LLMs to significantly longer context windows, reducing computational costs and memory requirements compared to standard attention mechanisms.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt and pip install flash-attn --no-build-isolation.
Requires Hugging Face account and acceptance of Meta's license for pre-trained weights.
Supports LLaMA2 and GPTNeoX base models.
Fine-tuning and inference scripts are provided.
Official documentation and examples are available within the repository.

Highlighted Details

Achieved ICLR 2024 Oral presentation.
Released models with context lengths up to 100k tokens (e.g., LLaMA2-LongLoRA-7B-100k).
Introduced LongAlpaca-12k, a long-context instruction-following dataset.
Supports QLoRA integration for further memory reduction during fine-tuning.

Maintenance & Community

Active development with regular updates and model releases.
Paper and GitHub repository are the primary sources of information.

Licensing & Compatibility

LongLoRA code is licensed under Apache License 2.0.
Data and model weights are under CC-BY-NC 4.0 License, restricting usage to non-commercial, research purposes only.

Limitations & Caveats

The data and model weights are strictly for research use and prohibit commercial applications. Models trained using this dataset must also remain within research boundaries.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days