recursive-llm  by ysz

LLM for unbounded context processing via recursive exploration

Created 4 months ago
459 stars

Top 65.9% on SourcePulse

GitHubView on GitHub
Project Summary

Recursive Language Models (RLM) provides a Python implementation for processing unbounded context lengths with Large Language Models (LLMs). It addresses the "context rot" problem by storing context as Python variables rather than within prompts, enabling LLMs to recursively explore and partition vast amounts of text (100k+ tokens). This approach is beneficial for researchers and engineers needing to analyze or query extremely long documents efficiently and accurately.

How It Works

RLM operates by maintaining the context as a Python variable, allowing the LLM to interact with it programmatically. The core mechanism involves a root LLM that receives the query and instructions, while the context is explored recursively. The LLM can "peek" at context segments, perform searches (e.g., using regex), and call itself recursively on sub-sections of the context. This adaptive exploration is managed via a REPL executor that safely executes Python code using RestrictedPython, enabling dynamic context navigation without prompt inflation.

Quick Start & Requirements

  • Installation: Not yet published to PyPI. Clone the repository (git clone https://github.com/ysz/recursive-llm.git), navigate into the directory, and install via pip install -e . or pip install -e ".[dev]".
  • Prerequisites: Python 3.9 or higher. An API key for a supported LLM provider (e.g., OpenAI, Anthropic) or a local model setup (Ollama, llama.cpp).
  • Links: Paper: https://alexzhang13.github.io/blog/2025/rlm/, LiteLLM Docs: https://docs.litellm.ai/

Highlighted Details

  • Processes 100k+ tokens, with demonstrated success on 1M+ tokens.
  • Achieved 33% better performance than baseline GPT-5 on the OOLONG benchmark (132k tokens) at similar costs, according to paper results.
  • Demonstrated 80% accuracy on structured data queries across 60k token contexts, significantly outperforming direct OpenAI calls (0% accuracy).
  • Offers substantial token efficiency, using ~2-3k tokens per query compared to 95k+ for direct context feeding.
  • Supports over 100 LLM providers through the LiteLLM integration.
  • Allows for cost optimization by using a cheaper model for recursive calls.
  • Provides an asynchronous API for improved performance with parallel recursive calls.

Maintenance & Community

The project is primarily associated with Grigori Gvadzabia, based on the provided citation. While explicit community channels like Discord or Slack are not mentioned, the GitHub repository serves as the central hub for development, issues, and contributions.

Licensing & Compatibility

The project is released under the MIT License. This permissive license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The REPL execution environment is currently sequential, lacking parallel code execution capabilities. Prefix caching is not yet implemented, and the recursion depth is configurable but limited. Streaming support is also not available in the current version. The package is not yet available on PyPI, requiring installation from source.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
102 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Luis Capelo Luis Capelo(Cofounder of Lightning AI).

LongLM by datamllab

0%
665
Self-Extend: LLM context window extension via self-attention
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.