InfLLM by thunlp

Research paper code for long-sequence LLM processing via training-free memory

Created 1 year ago

397 stars

Top 72.7% on SourcePulse

Project Summary

InfLLM enables Large Language Models (LLMs) to process extremely long sequences without retraining, addressing the limitations of standard LLMs on inputs exceeding their training context length. This training-free memory-based method is designed for researchers and developers working with LLM-driven agents or applications requiring analysis of lengthy streaming data.

How It Works

InfLLM stores distant context into memory units and uses an efficient lookup mechanism to retrieve relevant units for attention computation. This approach allows LLMs to maintain long-distance dependency capture, overcoming the limitations of methods that discard distant tokens. The system supports configurable memory unit retrieval strategies (e.g., LRU, FIFO) and can optionally leverage FAISS for faster retrieval.

Quick Start & Requirements

Install: pip install -r requirements.txt (from the project root)
Prerequisites: PyTorch >= 1.13.1, Transformers >= 4.37.2, Flash-Attention, FAISS (optional).
Usage: Evaluate with bash scripts/[infinitebench,longbench].sh or run a chatbot with python -m inf_llm.chat --model-path <model_path> --inf-llm-config-path <config_path.yaml>.
Resources: Requires significant GPU memory for long sequences; specific requirements depend on model size and sequence length.
Links: Paper, Code

Highlighted Details

Achieves superior performance compared to baselines that continually train LLMs on long sequences, without any additional training.
Effectively captures long-distance dependencies even when sequence lengths scale to 1,024K tokens.
Supports multiple LLM architectures and conversation types, with recent additions for LLaMA 3.
Offers configurable memory management strategies (LRU, FIFO, LRU-S) and retrieval mechanisms.

Maintenance & Community

Initial code release on March 3, 2024, with subsequent refactors for speed and memory efficiency.
Supports FAISS for top-k retrieval and LLaMA 3.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The code is released for research purposes.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The perhead attention option is noted as very time-consuming and intended for research use only.
FAISS integration increases inference time.
Configuration for async_global_stream may not be compatible.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

EM-LLM-model by em-llm

Episodic memory architecture for unbounded LLM context

Created 1 year ago

Updated 10 months ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI).

FILM by microsoft

LLM for enhanced context utilization

Created 1 year ago

Updated 1 year ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

Star-Attention by NVIDIA

PyTorch code for efficient LLM inference on long sequences

Created 1 year ago

Updated 6 months ago

LongAlign by THUDM

Recipe for long-context LLM alignment (research paper)

Created 1 year ago

Updated 1 year ago

A-Guide-to-Retrieval-Augmented-LLM by Wang-Shuo

Intro to retrieval augmented LLMs

Created 2 years ago

Updated 2 years ago

MemoryLLM by wangyu-ustc

Self-updatable LLMs with scalable long-term memory

Created 1 year ago

Updated 5 months ago

Starred by

Daniel Han

Daniel Han(Cofounder of Unsloth).

memory by facebookresearch

Reference implementation for memory layers research paper

Created 1 year ago

Updated 1 year ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

duo-attention by mit-han-lab

Framework for efficient long-context LLM inference

Created 1 year ago

Updated 11 months ago

LightMem by zjunlp

Augment LLMs and AI agents with efficient long-term memory

Created 7 months ago

Updated 4 days ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI).

LLMxMapReduce by thunlp

Framework for LLM long-sequence processing via MapReduce-inspired divide-and-conquer

Created 1 year ago

Updated 2 months ago

MemoRAG by qhjqhj00

RAG framework with memory-based data interface

Created 1 year ago

Updated 4 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

CAG by hhhuang

CAG: RAG alternative using LLM context windows, research paper

Created 1 year ago

Updated 7 months ago

Feedback? Help us improve.