Reference implementation for memory layers research paper
Top 81.8% on sourcepulse
This repository provides a reference implementation for "Memory Layers at Scale," a technique that enhances large language models by incorporating trainable key-value lookup mechanisms. This approach allows models to store and retrieve information efficiently without increasing computational cost (FLOPs), benefiting researchers and engineers working on large-scale language modeling.
How It Works
Memory layers augment dense feed-forward networks with sparse activation capabilities, enabling dedicated capacity for information storage and retrieval. The core implementation resides in lingua/product_key
, featuring memory.py
for the central logic, colwise_embeddingbag.py
for memory parallelization, and xformer_embeddingbag.py
for optimized embedding lookups. This design aims to complement compute-intensive layers by providing a cost-effective way to manage and access information.
Quick Start & Requirements
bash setup/create_env.sh
or sbatch setup/create_env.sh
for SLURM. Activate the environment with conda activate lingua_
.python setup/download_prepare_hf_data.py <dataset_name>
(e.g., fineweb_edu
) to download and prepare data.python setup/download_tokenizer.py <tokenizer_name>
(e.g., llama3
).torchrun
locally (e.g., torchrun --nproc-per-node 8 -m apps.main.train config=apps/main/configs/pkplus_373m_1024k.yaml
) or via SLURM using python -m lingua.stool
.Highlighted Details
Maintenance & Community
The project is from Meta AI (facebookresearch). It is based on the Meta Lingua codebase. Further community interaction details are not explicitly provided in the README.
Licensing & Compatibility
Licensed under the CC-BY-NC license. This license restricts commercial use and derivative works intended for commercial purposes.
Limitations & Caveats
The provided configurations are templates requiring user adaptation for specific environments and data paths. The CC-BY-NC license prohibits commercial use, limiting its applicability for many industry applications.
7 months ago
Inactive