memory by facebookresearch

Reference implementation for memory layers research paper

Created 1 year ago

370 stars

Top 76.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Daniel Han

Cofounder of Unsloth

Project Summary

This repository provides a reference implementation for "Memory Layers at Scale," a technique that enhances large language models by incorporating trainable key-value lookup mechanisms. This approach allows models to store and retrieve information efficiently without increasing computational cost (FLOPs), benefiting researchers and engineers working on large-scale language modeling.

How It Works

Memory layers augment dense feed-forward networks with sparse activation capabilities, enabling dedicated capacity for information storage and retrieval. The core implementation resides in lingua/product_key, featuring memory.py for the central logic, colwise_embeddingbag.py for memory parallelization, and xformer_embeddingbag.py for optimized embedding lookups. This design aims to complement compute-intensive layers by providing a cost-effective way to manage and access information.

Quick Start & Requirements

Installation: Clone the repository and run bash setup/create_env.sh or sbatch setup/create_env.sh for SLURM. Activate the environment with conda activate lingua_.
Data Preparation: Use python setup/download_prepare_hf_data.py <dataset_name> (e.g., fineweb_edu) to download and prepare data.
Tokenizer: Download tokenizers with python setup/download_tokenizer.py <tokenizer_name> (e.g., llama3).
Training: Launch jobs using torchrun locally (e.g., torchrun --nproc-per-node 8 -m apps.main.train config=apps/main/configs/pkplus_373m_1024k.yaml) or via SLURM using python -m lingua.stool.
Prerequisites: Requires a SLURM cluster for distributed training, Python, and potentially specific hardware configurations as indicated by the need to adapt provided YAML templates.
Documentation: Refer to the Meta Lingua README for more instructions.

Highlighted Details

Implements trainable key-value lookup mechanisms to augment LLMs.
Offers a cost-effective method for information storage and retrieval without increasing FLOPs.
Provides parallelization and optimized embedding bag implementations.
Includes scripts for environment setup, data preparation, and tokenizer downloads.

Maintenance & Community

The project is from Meta AI (facebookresearch). It is based on the Meta Lingua codebase. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the CC-BY-NC license. This license restricts commercial use and derivative works intended for commercial purposes.

Limitations & Caveats

The provided configurations are templates requiring user adaptation for specific environments and data paths. The CC-BY-NC license prohibits commercial use, limiting its applicability for many industry applications.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days