ChunkLlama by HKUNLP

Training-free method for extending LLM context windows

Created 1 year ago

444 stars

Top 67.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides ChunkLlama, a training-free method for extending the context window of Large Language Models (LLMs) by over 8x. It targets researchers and practitioners seeking to improve LLM performance on long-context tasks without costly retraining. ChunkLlama integrates seamlessly with existing inference libraries and positional encoding methods, enabling significant context scaling for models like Llama-2/3 and Mistral.

How It Works

ChunkLlama implements a "dual chunk attention" mechanism. This approach divides the attention computation into local and global chunks, allowing the model to process significantly longer sequences than its original pre-training length. This method is advantageous as it requires no additional training, making it a highly efficient way to achieve long-context capabilities. It is compatible with popular extrapolation techniques like Positional Interpolation (PI) and NTK-Aware RoPE, and memory-efficient inference libraries such as FlashAttention and vLLM.

Quick Start & Requirements

Installation: Editable install via pip install -e . within the vllm directory.
Prerequisites: Python, transformers, flash-attn (>= 2.5.0, < 2.6.0). GPU with sufficient VRAM is recommended for longer contexts (e.g., 80GB A100 for 90k context with Llama2 7B).
Setup: Requires modifying model config.json and integrating provided Python code snippets for inference.
Links: Official Quick Start, Flash Decoding

Highlighted Details

Achieves 100k context length for Llama-2/3 70B, outperforming standard models.
Supports inference for Qwen-2, Llama-2/3 with vLLM.
Demonstrates strong performance on PG19 perplexity and retrieval tasks, with ChunkLlama3-70B matching GPT-4 on some benchmarks.
Offers fine-tuning scripts for further performance improvements on long conversations.

Maintenance & Community

The project is associated with HKUNLP and acknowledges contributions from Fei Huang. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

Code: Apache License 2.0.
Data & Weights: CC-BY-NC 4.0 License, strictly for research and non-commercial use. Models trained using the dataset are restricted to research purposes.

Limitations & Caveats

The data and weights are licensed for non-commercial, research-only use, posing a significant restriction for commercial applications. While 7B models can achieve low perplexity, they may struggle with practical tasks, recommending larger models (13B/70B) for higher accuracy.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days