long_llama  by CStanKonrad

LLM for long context handling, fine-tuned with Focused Transformer

created 2 years ago
1,461 stars

Top 28.6% on sourcepulse

GitHubView on GitHub
Project Summary

LongLLaMA is a suite of large language models designed to handle significantly extended context lengths, up to 256k tokens and beyond. Built upon OpenLLaMA and Code Llama foundations, it employs the Focused Transformer (FoT) method for context scaling. This approach is beneficial for tasks requiring comprehension and generation over lengthy documents or conversations.

How It Works

LongLLaMA utilizes the Focused Transformer (FoT) method, which enhances context handling by allowing a subset of attention layers to access a memory cache of key-value pairs. FoT's novelty lies in its contrastive training procedure, where memory attention layers are exposed to both relevant and irrelevant keys. This trains the model to differentiate semantically diverse values, enabling extrapolation of effective context length far beyond training data.

Quick Start & Requirements

  • Install: pip install transformers==4.33.2 sentencepiece accelerate
  • Requirements: Python 3.x, PyTorch, Hugging Face transformers.
  • Usage: Load models via Hugging Face AutoModelForCausalLM. See Colab examples for detailed usage.

Highlighted Details

  • Achieves 256k token context length on passkey retrieval tasks.
  • Demonstrates performance improvements on TREC and WebQS with increased context.
  • Offers base (Apache 2.0) and instruction-tuned variants.
  • Includes code for FoT continued pretraining (JAX/Flax) and instruction tuning (PyTorch).

Maintenance & Community

  • Developed by Szymon Tworkowski, Konrad Staniszewski, and others.
  • Based on OpenLLaMA and Code Llama projects.
  • Citation available via provided BibTeX entry.

Licensing & Compatibility

  • Base LongLLaMA 3B models and source code are licensed under Apache License 2.0.
  • Instruction/chat tuned models are for research purposes only.
  • LongLLaMA-Code 7B inherits Code Llama license.

Limitations & Caveats

  • Instruction/chat tuned models are restricted to research use.
  • Drop-in replacement with standard LLaMA code limits context to 2048 tokens.
  • Future work may explore KNN search for further context scaling.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Feedback? Help us improve.