the-markovian-thinker by McGill-NLP

LLM reasoning with bounded state

Created 3 months ago

328 stars

Top 83.5% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> The Markovian Thinker project introduces a novel paradigm for efficient reasoning in Large Language Models (LLMs) trained with Reinforcement Learning (RL). It addresses the quadratic computational complexity of traditional RLHF by proposing a fixed-size state representation, enabling linear scaling with reasoning length. This benefits researchers and practitioners seeking to improve LLM reasoning capabilities without prohibitive computational costs.

How It Works

The core innovation is the "Markovian Thinking" paradigm, which reformulates the RL environment to maintain a bounded, fixed-size state. This is implemented via the "Delethink" approach, which processes generation in fixed-size chunks. At chunk boundaries, the context is reset to the original prompt plus a concise carryover, compelling the model to learn state-dependent progress. This contrasts with sequential token concatenation (e.g., LongCoT), which leads to exponentially growing state sizes and quadratic compute costs. Delethink achieves linear compute complexity and flat memory usage.

Quick Start & Requirements

Installation relies on the verl and SGLang frameworks, with options for a pre-built Docker image or uv-based installation (details in INSTALLATION.md).
Prerequisites include significant GPU resources (demonstrated with multi-GPU setups and H100s). Specific models like deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B are used for training and demos.
Trajectory visualization requires textual==0.52.1.
Links: INSTALLATION.md, verl.readthedocs.io (for Ray debugger).

Highlighted Details

Delethink (24K context) achieves comparable or superior accuracy to LongCoT-RL (24K context) with reduced compute.
The method demonstrates continued performance improvement beyond its trained context budget, unlike methods that plateau.
Training exhibits linear compute scaling with reasoning length, contrasting with LongCoT's quadratic scaling.
Large models like GPT-OSS-120B and Qwen3-30B-A3B show zero-shot Markovian Thinking capabilities.
The framework supports scaling reasoning to 96K tokens.

Maintenance & Community

Authored by researchers from McGill University, Mila, and Microsoft.
The codebase is built upon verl and SGLang.
The project was released in October 2025, with paper, models, and codebase available.
No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification for adoption decisions, especially regarding commercial use or derivative works.

Limitations & Caveats

The evaluation section is marked as "TBD," suggesting that comprehensive performance benchmarks or results may still be pending.
As a very recent release (October 2025), long-term maintenance and community support are yet to be established.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

Optimizing large reasoning models for concise outputs

Created 10 months ago

Updated 5 months ago

ThinkMesh by martianlantern

Parallel reasoning framework for LLMs

Created 4 months ago

Updated 3 weeks ago

MemoryLLM by wangyu-ustc

Self-updatable LLMs with scalable long-term memory

Created 1 year ago

Updated 5 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

Awesome-Efficient-Reasoning-LLMs by Eclipsess

Survey of efficient reasoning techniques for LLMs

Created 9 months ago

Updated 2 months ago

ai-infra-learning by cr7258

AI infrastructure learning for efficient LLM inference

Created 8 months ago

Updated 1 week ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

Soft-Thinking by eric-ai-lab

Enhancing LLM reasoning via continuous concept spaces

Created 7 months ago

Updated 1 month ago

HMLR-Agentic-AI-Memory-System by Sean-V-Dev

State-aware long-term memory for AI agents

Created 1 month ago

Updated 1 week ago

deepconf by facebookresearch

Parallel thinking framework for enhanced LLM reasoning

Created 4 months ago

Updated 3 months ago

Starred by

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

4 more.

batch_invariant_ops by thinking-machines-lab

Enhance LLM inference determinism

Created 4 months ago

Updated 2 months ago

M_GRPO by baibizhe

Stabilizing LLM reasoning with self-supervised RL

Created 4 months ago

Updated 2 months ago

Starred by

Casper Hansen

Casper Hansen(Author of AutoAWQ),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

2 more.

rStar by zhentingqi

Research paper for improving small LLM reasoning via mutual reasoning

Created 1 year ago

Updated 11 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Logic-RL by Unakar

LLM reasoning via rule-based reinforcement learning, research paper

Created 11 months ago

Updated 9 months ago

Feedback? Help us improve.