MemAgent  by BytedTsinghua-SIA

Long-context LLM framework with RL-based memory

Created 2 months ago
682 stars

Top 49.7% on SourcePulse

GitHubView on GitHub
Project Summary

MemAgent is a framework for optimizing long-context Large Language Models (LLMs) using Reinforcement Learning (RL), enabling extrapolation to significantly larger contexts with minimal performance degradation. It's designed for researchers and developers working with LLMs who need to process and understand extremely long documents or conversations.

How It Works

MemAgent introduces a novel memory mechanism that allows LLMs to handle arbitrarily long inputs within fixed context windows. This is achieved through an RL-driven extrapolation approach, specifically using Reinforcement Learning from Verifiable Rewards (RLVR) and extending the DAPO algorithm. This method optimizes agent workflows with multi-turn, context-independent conversations, achieving linear time complexity with respect to text length.

Quick Start & Requirements

  • Local Deployment: Use vllm serve BytedTsinghua-SIA/RL-MemoryAgent-14B --tensor_parallel_size 2 followed by python quickstart.py --model BytedTsinghua-SIA/RL-MemoryAgent-14B.
  • Online Services: Configure URL and API_KEY environment variables.
  • Prerequisites: Python, vLLM, Ray, httpx==0.23.1, aiohttp. Manual download and configuration of Qwen2.5-Instruct models are required for testing.
  • Resources: Testing and training can take several days and utilize all available GPUs.
  • Links: Paper, Blog, Datasets, Weights.

Highlighted Details

  • Achieves <5% performance loss on 3.5M token QA tasks with a 14B model.
  • Demonstrates 95%+ accuracy on 512K RULER test tasks.
  • Features a linear time complexity for long-text processing.
  • Supports both sync (tool-calling to general workflow) and async (agent as a function) modes.

Maintenance & Community

The project is associated with BytedTsinghua-SIA. Key updates were released in June and July 2025.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The validation score during training may differ significantly from the final score due to stricter verifiers used during training to prevent reward hacking. Manual intervention is required for specific model configurations (e.g., Qwen2.5-Instruct YaRN activation). Running all provided tests is time-intensive.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
80 stars in the last 30 days

Explore Similar Projects

Starred by Phil Wang Phil Wang(Prolific Research Paper Implementer), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
6 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
Created 2 months ago
Updated 1 week ago
Feedback? Help us improve.