evo-memory by SakanaAI

Research paper code for universal transformer memory

Created 1 year ago

346 stars

Top 80.2% on SourcePulse

Project Summary

This repository provides code for training and evaluating Neural Attention Memory Models (NAMMs), designed to create universally applicable memory systems for transformer architectures. It targets researchers and practitioners working with long-context transformers, offering a method to enhance their memory capabilities.

How It Works

The project implements an "evolved" transformer memory system, likely employing techniques from evolutionary computation to optimize attention mechanisms and memory components. This approach aims to create a more general and efficient memory solution compared to standard transformer architectures, potentially improving performance on tasks requiring long-range dependencies.

Quick Start & Requirements

Installation: conda env create --file=env.yaml (full dependencies) or conda env create --file=env_minimal.yaml (minimal dependencies).
Prerequisites: Requires torchrun for distributed training, wandb for logging (requires wandb login), and Hugging Face Hub authentication (huggingface-cli login) for gated models.
Usage: Training involves multi-stage commands using torchrun and Hydra configuration files (e.g., namm_bam_i1.yaml, namm_bam_i2.yaml, namm_bam_i3.yaml). Evaluation commands are also provided for LongBench and ChouBun tasks.
Links: Paper, Hugging Face, Dataset

Highlighted Details

Implements an "Evolved Universal Transformer Memory" as detailed in their paper.
Supports multi-stage training for incremental memory system development.
Provides evaluation scripts for standard long-context benchmarks like LongBench and ChouBun.
Integrates with wandb for experiment tracking and Hugging Face for model access.

Maintenance & Community

The project is associated with SakanaAI. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires significant computational resources for training, indicated by the use of torchrun and multiple GPUs. Access to gated models necessitates Hugging Face authentication. The multi-stage training process implies a complex setup.

evo-memory by SakanaAI

Explore Similar Projects

MegaDLMs by JinjieNi

InfiniTransformer by Beomi

TPA by tensorgi

flan-alpaca by declare-lab

gritlm by ContextualAI

megalodon by XuezheMax

mini_qwen by qiufengqijun

bert4torch by Tongjilibo

titans-pytorch by lucidrains

transformers-tutorials by abhimishra91

Pai-Megatron-Patch by alibaba

stanford-cme-295-transformers-large-language-models by afshinea