MemAgent by BytedTsinghua-SIA

Long-context LLM framework with RL-based memory

Created 6 months ago

849 stars

Top 42.1% on SourcePulse

Project Summary

MemAgent is a framework for optimizing long-context Large Language Models (LLMs) using Reinforcement Learning (RL), enabling extrapolation to significantly larger contexts with minimal performance degradation. It's designed for researchers and developers working with LLMs who need to process and understand extremely long documents or conversations.

How It Works

MemAgent introduces a novel memory mechanism that allows LLMs to handle arbitrarily long inputs within fixed context windows. This is achieved through an RL-driven extrapolation approach, specifically using Reinforcement Learning from Verifiable Rewards (RLVR) and extending the DAPO algorithm. This method optimizes agent workflows with multi-turn, context-independent conversations, achieving linear time complexity with respect to text length.

Quick Start & Requirements

Local Deployment: Use vllm serve BytedTsinghua-SIA/RL-MemoryAgent-14B --tensor_parallel_size 2 followed by python quickstart.py --model BytedTsinghua-SIA/RL-MemoryAgent-14B.
Online Services: Configure URL and API_KEY environment variables.
Prerequisites: Python, vLLM, Ray, httpx==0.23.1, aiohttp. Manual download and configuration of Qwen2.5-Instruct models are required for testing.
Resources: Testing and training can take several days and utilize all available GPUs.
Links: Paper, Blog, Datasets, Weights.

Highlighted Details

Achieves <5% performance loss on 3.5M token QA tasks with a 14B model.
Demonstrates 95%+ accuracy on 512K RULER test tasks.
Features a linear time complexity for long-text processing.
Supports both sync (tool-calling to general workflow) and async (agent as a function) modes.

Maintenance & Community

The project is associated with BytedTsinghua-SIA. Key updates were released in June and July 2025.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The validation score during training may differ significantly from the final score due to stricter verifiers used during training to prevent reward hacking. Manual intervention is required for specific model configurations (e.g., Qwen2.5-Instruct YaRN activation). Running all provided tests is time-intensive.

MemAgent by BytedTsinghua-SIA

Explore Similar Projects

LongAlign by THUDM

awesome-AI-system by lambda7xx

siiRL by sii-research

powermem by oceanbase

ARPO by RUC-NLPIR

Gym by NVIDIA-NeMo

intro-llm.github.io by intro-llm

agents-flex by agents-flex

LLM-VM by anarchy-ai

ROLL by alibaba

SkyRL by NovaSky-AI

Kimi-K2 by MoonshotAI