recursive-llm by grishahq

LLM for unbounded context processing via recursive exploration

Created 8 months ago

565 stars

Top 56.2% on SourcePulse

Project Summary

Recursive Language Models (RLM) provides a Python implementation for processing unbounded context lengths with Large Language Models (LLMs). It addresses the "context rot" problem by storing context as Python variables rather than within prompts, enabling LLMs to recursively explore and partition vast amounts of text (100k+ tokens). This approach is beneficial for researchers and engineers needing to analyze or query extremely long documents efficiently and accurately.

How It Works

RLM operates by maintaining the context as a Python variable, allowing the LLM to interact with it programmatically. The core mechanism involves a root LLM that receives the query and instructions, while the context is explored recursively. The LLM can "peek" at context segments, perform searches (e.g., using regex), and call itself recursively on sub-sections of the context. This adaptive exploration is managed via a REPL executor that safely executes Python code using RestrictedPython, enabling dynamic context navigation without prompt inflation.

Quick Start & Requirements

Installation: Not yet published to PyPI. Clone the repository (git clone https://github.com/ysz/recursive-llm.git), navigate into the directory, and install via pip install -e . or pip install -e ".[dev]".
Prerequisites: Python 3.9 or higher. An API key for a supported LLM provider (e.g., OpenAI, Anthropic) or a local model setup (Ollama, llama.cpp).
Links: Paper: https://alexzhang13.github.io/blog/2025/rlm/, LiteLLM Docs: https://docs.litellm.ai/

Highlighted Details

Processes 100k+ tokens, with demonstrated success on 1M+ tokens.
Achieved 33% better performance than baseline GPT-5 on the OOLONG benchmark (132k tokens) at similar costs, according to paper results.
Demonstrated 80% accuracy on structured data queries across 60k token contexts, significantly outperforming direct OpenAI calls (0% accuracy).
Offers substantial token efficiency, using ~2-3k tokens per query compared to 95k+ for direct context feeding.
Supports over 100 LLM providers through the LiteLLM integration.
Allows for cost optimization by using a cheaper model for recursive calls.
Provides an asynchronous API for improved performance with parallel recursive calls.

Maintenance & Community

The project is primarily associated with Grigori Gvadzabia, based on the provided citation. While explicit community channels like Discord or Slack are not mentioned, the GitHub repository serves as the central hub for development, issues, and contributions.

Licensing & Compatibility

The project is released under the MIT License. This permissive license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The REPL execution environment is currently sequential, lacking parallel code execution capabilities. Prefix caching is not yet implemented, and the recursion depth is configurable but limited. Streaming support is also not available in the current version. The package is not yet available on PyPI, requiring installation from source.

recursive-llm by grishahq

Explore Similar Projects

FILM by microsoft

EM-LLM-model by em-llm

NBCE by bojone

context-rot by chroma-core

cartridges by HazyResearch

token-optimizer-mcp by ooples

Samba by microsoft

Practical-Guide-to-Context-Engineering by WakeUp-Jin

RLM by brainqub3

long_llama by CStanKonrad

how_to_fix_your_context by langchain-ai

scaledown by scaledown-team