long_llama  by CStanKonrad

LLM for long context handling, fine-tuned with Focused Transformer

Created 2 years ago
1,463 stars

Top 28.0% on SourcePulse

GitHubView on GitHub
Project Summary

LongLLaMA is a suite of large language models designed to handle significantly extended context lengths, up to 256k tokens and beyond. Built upon OpenLLaMA and Code Llama foundations, it employs the Focused Transformer (FoT) method for context scaling. This approach is beneficial for tasks requiring comprehension and generation over lengthy documents or conversations.

How It Works

LongLLaMA utilizes the Focused Transformer (FoT) method, which enhances context handling by allowing a subset of attention layers to access a memory cache of key-value pairs. FoT's novelty lies in its contrastive training procedure, where memory attention layers are exposed to both relevant and irrelevant keys. This trains the model to differentiate semantically diverse values, enabling extrapolation of effective context length far beyond training data.

Quick Start & Requirements

  • Install: pip install transformers==4.33.2 sentencepiece accelerate
  • Requirements: Python 3.x, PyTorch, Hugging Face transformers.
  • Usage: Load models via Hugging Face AutoModelForCausalLM. See Colab examples for detailed usage.

Highlighted Details

  • Achieves 256k token context length on passkey retrieval tasks.
  • Demonstrates performance improvements on TREC and WebQS with increased context.
  • Offers base (Apache 2.0) and instruction-tuned variants.
  • Includes code for FoT continued pretraining (JAX/Flax) and instruction tuning (PyTorch).

Maintenance & Community

  • Developed by Szymon Tworkowski, Konrad Staniszewski, and others.
  • Based on OpenLLaMA and Code Llama projects.
  • Citation available via provided BibTeX entry.

Licensing & Compatibility

  • Base LongLLaMA 3B models and source code are licensed under Apache License 2.0.
  • Instruction/chat tuned models are for research purposes only.
  • LongLLaMA-Code 7B inherits Code Llama license.

Limitations & Caveats

  • Instruction/chat tuned models are restricted to research use.
  • Drop-in replacement with standard LLaMA code limits context to 2048 tokens.
  • Future work may explore KNN search for further context scaling.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Luis Capelo Luis Capelo(Cofounder of Lightning AI).

LongLM by datamllab

0%
661
Self-Extend: LLM context window extension via self-attention
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.