long_llama by CStanKonrad

LLM for long context handling, fine-tuned with Focused Transformer

Created 2 years ago

1,463 stars

Top 27.8% on SourcePulse

4 Experts Love This Project

merrymercy

Coauthor of SGLang, vLLM

Ying1123

Coauthor of SGLang

pgarbacki

Cofounder of Fireworks AI

simonw

Coauthor of Django

Project Summary

LongLLaMA is a suite of large language models designed to handle significantly extended context lengths, up to 256k tokens and beyond. Built upon OpenLLaMA and Code Llama foundations, it employs the Focused Transformer (FoT) method for context scaling. This approach is beneficial for tasks requiring comprehension and generation over lengthy documents or conversations.

How It Works

LongLLaMA utilizes the Focused Transformer (FoT) method, which enhances context handling by allowing a subset of attention layers to access a memory cache of key-value pairs. FoT's novelty lies in its contrastive training procedure, where memory attention layers are exposed to both relevant and irrelevant keys. This trains the model to differentiate semantically diverse values, enabling extrapolation of effective context length far beyond training data.

Quick Start & Requirements

Install: pip install transformers==4.33.2 sentencepiece accelerate
Requirements: Python 3.x, PyTorch, Hugging Face transformers.
Usage: Load models via Hugging Face AutoModelForCausalLM. See Colab examples for detailed usage.

Highlighted Details

Achieves 256k token context length on passkey retrieval tasks.
Demonstrates performance improvements on TREC and WebQS with increased context.
Offers base (Apache 2.0) and instruction-tuned variants.
Includes code for FoT continued pretraining (JAX/Flax) and instruction tuning (PyTorch).

Maintenance & Community

Developed by Szymon Tworkowski, Konrad Staniszewski, and others.
Based on OpenLLaMA and Code Llama projects.
Citation available via provided BibTeX entry.

Licensing & Compatibility

Base LongLLaMA 3B models and source code are licensed under Apache License 2.0.
Instruction/chat tuned models are for research purposes only.
LongLLaMA-Code 7B inherits Code Llama license.

Limitations & Caveats

Instruction/chat tuned models are restricted to research use.
Drop-in replacement with standard LLaMA code limits context to 2048 tokens.
Future work may explore KNN search for further context scaling.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Wing Lian

Wing Lian(Founder of Axolotl AI).

long-llms-learning by Strivin0311

Literature repository for long-context LLM methodologies

Created 2 years ago

Updated 1 year ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI).

FILM by microsoft

LLM for enhanced context utilization

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

NBCE by bojone

Context extension technique for LLMs (research paper)

Created 2 years ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

ChunkLlama by HKUNLP

Training-free method for extending LLM context windows

Created 1 year ago

Updated 1 year ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n).

Samba by microsoft

Language model research paper for efficient unlimited context

Created 1 year ago

Updated 1 month ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

1 more.

ringattention by haoliuhl

Jax implementation of RingAttention for large context models (research paper)

Created 2 years ago

Updated 3 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

Kimi-Linear by MoonshotAI

Efficient linear attention architecture accelerates long-context LLMs

Created 2 months ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

LongLM by datamllab

Self-Extend: LLM context window extension via self-attention

Created 2 years ago

Updated 1 year ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

Aria by rhymes-ai

Multimodal MoE model for video, document understanding, and dialog

Created 1 year ago

Updated 11 months ago

Telechat by Tele-AI

Chinese LLM for dialogue, long-form generation, and code assistance

Created 2 years ago

Updated 1 year ago

Starred by

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

1 more.

starcoder2 by bigcode-project

Code generation model family (3B, 7B, 15B) for code completion

Created 2 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

4 more.

LongLoRA by JIA-Lab-research

LongLoRA: Efficient fine-tuning for long-context LLMs

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.