Engram by deepseek-ai

Scalable conditional memory for large language models

Created 2 months ago

3,862 stars

Top 12.5% on SourcePulse

2 Experts Love This Project

pgarbacki

Cofounder of Fireworks AI

zhyncs

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

This repository provides the official implementation for Engram, a novel conditional memory module designed to enhance Large Language Models (LLMs) by introducing a new axis of sparsity. It addresses the inherent lack of native knowledge lookup primitives in Transformers, complementing existing Mixture-of-Experts (MoE) approaches. Engram offers a scalable lookup mechanism, enabling improved model capacity and performance under strict parameter and FLOP constraints, particularly benefiting knowledge-intensive tasks and complex reasoning.

How It Works

Engram augments Transformer backbones by retrieving static N-gram memory and fusing it with dynamic hidden states. It modernizes classic N-gram embeddings for efficient O(1) lookup. This approach is advantageous as it formulates a trade-off between neural computation (MoE) and static memory, guided by a U-shaped scaling law for optimal capacity allocation. Deterministic addressing allows massive embedding tables to be offloaded to host memory with minimal inference overhead, enhancing system efficiency.

Quick Start & Requirements

Installation: pip install torch numpy transformers sympy
Prerequisites: Python 3.8+, PyTorch.
Demo: A standalone implementation engram_demo_v1.py is provided to illustrate the core logic, mocking standard components.
Note: The demo code is intended for illustrating data flow and the Engram module's functionality.

Highlighted Details

The Engram-27B model shows consistent improvements over MoE baselines across knowledge, reasoning, code, and math domains under iso-parameter and iso-FLOPs constraints.
Mechanistic analysis suggests Engram relieves early layers from static pattern reconstruction, potentially preserving effective depth for complex reasoning.
The module employs deterministic addressing for efficient offloading of massive embedding tables to host memory.

Maintenance & Community

Contact: For questions, raise an issue on the repository or contact service@deepseek.com.

Licensing & Compatibility

License: Use of Engram models is subject to a specific "Model License" (details not provided in the README).
Compatibility: No explicit notes on commercial use or closed-source linking are present; the "Model License" should be consulted.

Limitations & Caveats

The provided code is a demonstration version, mocking standard components to highlight the Engram module's data flow.
The specific terms and restrictions of the "Model License" are not detailed in the README, requiring further investigation for adoption.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

5

Star History

237 stars in the last 30 days

Explore Similar Projects

MagicPIG by Infini-AI-Lab

Efficient LLM generation via LSH sampling

Created 1 year ago

Updated 1 year ago

Diver by AQ-MedAI

Advanced RAG for complex reasoning

Created 6 months ago

Updated 1 month ago

MemoryLLM by wangyu-ustc

Self-updatable LLMs with scalable long-term memory

Created 1 year ago

Updated 7 months ago

Starred by

Daniel Han

Daniel Han(Cofounder of Unsloth).

memory by facebookresearch

Reference implementation for memory layers research paper

Created 1 year ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

snowflake-arctic by Snowflake-Labs

AI research project for efficient LLM training and inference

Created 1 year ago

Updated 1 year ago

LightMem by zjunlp

Augment LLMs and AI agents with efficient long-term memory

Created 9 months ago

Updated 1 day ago

unified-cache-management by ModelEngine-Group

Speed up LLM inference by managing KV cache

Created 8 months ago

Updated 21 hours ago

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate).

ReWOO by billxbf

Research paper for efficient augmented language models

Created 2 years ago

Updated 2 years ago

Starred by

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

7 more.

xgrammar by mlc-ai

Library for efficient structured generation

Created 1 year ago

Updated 19 hours ago

MemoRAG by qhjqhj00

RAG framework with memory-based data interface

Created 1 year ago

Updated 6 months ago

Starred by

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

1 more.

starcoder2 by bigcode-project

Code generation model family (3B, 7B, 15B) for code completion

Created 2 years ago

Updated 2 years ago

OpenMemory by CaviraOSS

AI memory engine for persistent, explainable recall

Created 4 months ago

Updated 1 week ago

Feedback? Help us improve.