LLM for long context handling, fine-tuned with Focused Transformer
Top 28.6% on sourcepulse
LongLLaMA is a suite of large language models designed to handle significantly extended context lengths, up to 256k tokens and beyond. Built upon OpenLLaMA and Code Llama foundations, it employs the Focused Transformer (FoT) method for context scaling. This approach is beneficial for tasks requiring comprehension and generation over lengthy documents or conversations.
How It Works
LongLLaMA utilizes the Focused Transformer (FoT) method, which enhances context handling by allowing a subset of attention layers to access a memory cache of key-value pairs. FoT's novelty lies in its contrastive training procedure, where memory attention layers are exposed to both relevant and irrelevant keys. This trains the model to differentiate semantically diverse values, enabling extrapolation of effective context length far beyond training data.
Quick Start & Requirements
pip install transformers==4.33.2 sentencepiece accelerate
transformers
.AutoModelForCausalLM
. See Colab examples for detailed usage.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 week