memory  by facebookresearch

Reference implementation for memory layers research paper

created 8 months ago
343 stars

Top 81.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a reference implementation for "Memory Layers at Scale," a technique that enhances large language models by incorporating trainable key-value lookup mechanisms. This approach allows models to store and retrieve information efficiently without increasing computational cost (FLOPs), benefiting researchers and engineers working on large-scale language modeling.

How It Works

Memory layers augment dense feed-forward networks with sparse activation capabilities, enabling dedicated capacity for information storage and retrieval. The core implementation resides in lingua/product_key, featuring memory.py for the central logic, colwise_embeddingbag.py for memory parallelization, and xformer_embeddingbag.py for optimized embedding lookups. This design aims to complement compute-intensive layers by providing a cost-effective way to manage and access information.

Quick Start & Requirements

  • Installation: Clone the repository and run bash setup/create_env.sh or sbatch setup/create_env.sh for SLURM. Activate the environment with conda activate lingua_.
  • Data Preparation: Use python setup/download_prepare_hf_data.py <dataset_name> (e.g., fineweb_edu) to download and prepare data.
  • Tokenizer: Download tokenizers with python setup/download_tokenizer.py <tokenizer_name> (e.g., llama3).
  • Training: Launch jobs using torchrun locally (e.g., torchrun --nproc-per-node 8 -m apps.main.train config=apps/main/configs/pkplus_373m_1024k.yaml) or via SLURM using python -m lingua.stool.
  • Prerequisites: Requires a SLURM cluster for distributed training, Python, and potentially specific hardware configurations as indicated by the need to adapt provided YAML templates.
  • Documentation: Refer to the Meta Lingua README for more instructions.

Highlighted Details

  • Implements trainable key-value lookup mechanisms to augment LLMs.
  • Offers a cost-effective method for information storage and retrieval without increasing FLOPs.
  • Provides parallelization and optimized embedding bag implementations.
  • Includes scripts for environment setup, data preparation, and tokenizer downloads.

Maintenance & Community

The project is from Meta AI (facebookresearch). It is based on the Meta Lingua codebase. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the CC-BY-NC license. This license restricts commercial use and derivative works intended for commercial purposes.

Limitations & Caveats

The provided configurations are templates requiring user adaptation for specific environments and data paths. The CC-BY-NC license prohibits commercial use, limiting its applicability for many industry applications.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.