Research paper code for long-sequence LLM processing via training-free memory
Top 77.1% on sourcepulse
InfLLM enables Large Language Models (LLMs) to process extremely long sequences without retraining, addressing the limitations of standard LLMs on inputs exceeding their training context length. This training-free memory-based method is designed for researchers and developers working with LLM-driven agents or applications requiring analysis of lengthy streaming data.
How It Works
InfLLM stores distant context into memory units and uses an efficient lookup mechanism to retrieve relevant units for attention computation. This approach allows LLMs to maintain long-distance dependency capture, overcoming the limitations of methods that discard distant tokens. The system supports configurable memory unit retrieval strategies (e.g., LRU, FIFO) and can optionally leverage FAISS for faster retrieval.
Quick Start & Requirements
pip install -r requirements.txt
(from the project root)bash scripts/[infinitebench,longbench].sh
or run a chatbot with python -m inf_llm.chat --model-path <model_path> --inf-llm-config-path <config_path.yaml>
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
perhead
attention option is noted as very time-consuming and intended for research use only.async_global_stream
may not be compatible.1 year ago
1 day