evo-memory  by SakanaAI

Research paper code for universal transformer memory

created 9 months ago
316 stars

Top 86.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for training and evaluating Neural Attention Memory Models (NAMMs), designed to create universally applicable memory systems for transformer architectures. It targets researchers and practitioners working with long-context transformers, offering a method to enhance their memory capabilities.

How It Works

The project implements an "evolved" transformer memory system, likely employing techniques from evolutionary computation to optimize attention mechanisms and memory components. This approach aims to create a more general and efficient memory solution compared to standard transformer architectures, potentially improving performance on tasks requiring long-range dependencies.

Quick Start & Requirements

  • Installation: conda env create --file=env.yaml (full dependencies) or conda env create --file=env_minimal.yaml (minimal dependencies).
  • Prerequisites: Requires torchrun for distributed training, wandb for logging (requires wandb login), and Hugging Face Hub authentication (huggingface-cli login) for gated models.
  • Usage: Training involves multi-stage commands using torchrun and Hydra configuration files (e.g., namm_bam_i1.yaml, namm_bam_i2.yaml, namm_bam_i3.yaml). Evaluation commands are also provided for LongBench and ChouBun tasks.
  • Links: Paper, Hugging Face, Dataset

Highlighted Details

  • Implements an "Evolved Universal Transformer Memory" as detailed in their paper.
  • Supports multi-stage training for incremental memory system development.
  • Provides evaluation scripts for standard long-context benchmarks like LongBench and ChouBun.
  • Integrates with wandb for experiment tracking and Hugging Face for model access.

Maintenance & Community

The project is associated with SakanaAI. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires significant computational resources for training, indicated by the use of torchrun and multiple GPUs. Access to gated models necessitates Hugging Face authentication. The multi-stage training process implies a complex setup.

Health Check
Last commit

9 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.