Research paper code for universal transformer memory
Top 86.7% on sourcepulse
This repository provides code for training and evaluating Neural Attention Memory Models (NAMMs), designed to create universally applicable memory systems for transformer architectures. It targets researchers and practitioners working with long-context transformers, offering a method to enhance their memory capabilities.
How It Works
The project implements an "evolved" transformer memory system, likely employing techniques from evolutionary computation to optimize attention mechanisms and memory components. This approach aims to create a more general and efficient memory solution compared to standard transformer architectures, potentially improving performance on tasks requiring long-range dependencies.
Quick Start & Requirements
conda env create --file=env.yaml
(full dependencies) or conda env create --file=env_minimal.yaml
(minimal dependencies).torchrun
for distributed training, wandb
for logging (requires wandb login
), and Hugging Face Hub authentication (huggingface-cli login
) for gated models.torchrun
and Hydra configuration files (e.g., namm_bam_i1.yaml
, namm_bam_i2.yaml
, namm_bam_i3.yaml
). Evaluation commands are also provided for LongBench and ChouBun tasks.Highlighted Details
wandb
for experiment tracking and Hugging Face for model access.Maintenance & Community
The project is associated with SakanaAI. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project requires significant computational resources for training, indicated by the use of torchrun
and multiple GPUs. Access to gated models necessitates Hugging Face authentication. The multi-stage training process implies a complex setup.
9 months ago
1+ week