LongMem by Victorwz

Research paper implementation for augmenting language models with long-term memory

Created 2 years ago

821 stars

Top 43.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

LongMem provides the official implementation for augmenting language models with long-term memory, addressing the limitations of fixed context windows in Transformers. It is targeted at researchers and practitioners in NLP seeking to improve LLM performance on tasks requiring extended context. The primary benefit is enabling models to access and utilize information beyond their immediate input sequence.

How It Works

LongMem augments standard Transformer architectures with a dynamic memory bank. This memory bank stores past information, allowing the model to retrieve and integrate relevant context. The approach utilizes a SideNetwork module and joint attention mechanisms to fuse information from the memory bank with the current input, enabling efficient access to a much larger effective context.

Quick Start & Requirements

Install fairseq: pip install --editable ./fairseq
Install faiss-gpu: pip install faiss-gpu (or conda install faiss-gpu cudatoolkit=11.0 -c pytorch for A100/A6000, with potential A100 compatibility issues).
Other dependencies: pip install -r requirements.txt
Recommended Python version: 3.8.
Requires PyTorch (>=1.8.0, GPU version matching CUDA driver).
Pre-trained model checkpoints are required for evaluation.
Official documentation and setup guides are available via the project's repository.

Highlighted Details

Official implementation of the NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
Features Transformer Decoders with SideNetworks and dynamic memory banks.
Supports memory fusion via joint attention mechanisms.
Includes scripts for memory-augmented adaptation training and in-context learning evaluation.

Maintenance & Community

The project is based on the fairseq library. Credits are given to eleuther.ai for The Pile dataset. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, its reliance on fairseq and the mention of eleuther.ai's The Pile dataset suggest potential compatibility with research and non-commercial use cases, but commercial use would require careful verification of underlying licenses.

Limitations & Caveats

The README notes potential compatibility issues with faiss-gpu on A100 GPUs, recommending users refer to faiss GitHub issues for troubleshooting. The project is presented as an official implementation of a research paper, implying it may be primarily focused on research reproducibility rather than production-readiness.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days