Discover and explore top open-source AI tools and projects—updated daily.
McGill-NLPLLM reasoning with bounded state
New!
Top 88.6% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> The Markovian Thinker project introduces a novel paradigm for efficient reasoning in Large Language Models (LLMs) trained with Reinforcement Learning (RL). It addresses the quadratic computational complexity of traditional RLHF by proposing a fixed-size state representation, enabling linear scaling with reasoning length. This benefits researchers and practitioners seeking to improve LLM reasoning capabilities without prohibitive computational costs.
How It Works
The core innovation is the "Markovian Thinking" paradigm, which reformulates the RL environment to maintain a bounded, fixed-size state. This is implemented via the "Delethink" approach, which processes generation in fixed-size chunks. At chunk boundaries, the context is reset to the original prompt plus a concise carryover, compelling the model to learn state-dependent progress. This contrasts with sequential token concatenation (e.g., LongCoT), which leads to exponentially growing state sizes and quadratic compute costs. Delethink achieves linear compute complexity and flat memory usage.
Quick Start & Requirements
verl and SGLang frameworks, with options for a pre-built Docker image or uv-based installation (details in INSTALLATION.md).deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B are used for training and demos.textual==0.52.1.INSTALLATION.md, verl.readthedocs.io (for Ray debugger).Highlighted Details
Maintenance & Community
verl and SGLang.Licensing & Compatibility
Limitations & Caveats
2 weeks ago
Inactive
thinking-machines-lab