Discover and explore top open-source AI tools and projects—updated daily.
microsoftLLM reasoning extension framework
Top 70.7% on SourcePulse
Extends the effective output length of large language models by segmenting chain-of-thought reasoning into manageable blocks. This approach allows models to perform more complex reasoning tasks within fixed context window constraints, benefiting researchers and developers working with LLMs on extended generation or analysis.
How It Works
Memento implements a block-based reasoning strategy where chain-of-thought (CoT) is divided into discrete segments. After each reasoning block, a concise summary is generated, and the detailed block content is then evicted from the KV cache. The model continues its reasoning process from this summary, effectively reducing the context size and enabling deeper, multi-step computations within the original, fixed context window. This is facilitated by specialized tokens for block and summary boundaries and a modified inference engine.
Quick Start & Requirements
pip install -r data/requirements.txt. Requires an OPENAI_API_KEY or compatible provider. Run with python run_full_pipeline.py --input ../examples/example_trace.jsonl --output-dir output/ --model gpt-4o --limit 1. See data/README.md for full documentation.vllm==0.13.0 and apply the overlay: pip install vllm==0.13.0, cd vllm, bash install_overlay.sh. Serve a Memento model using python -m vllm.entrypoints.openai.api_server --model /path/to/memento-checkpoint ... --block-masking-config '{...}'. Requires a Memento model checkpoint and GPU resources. See vllm/README.md for full documentation.Highlighted Details
<|block_start|>, <|summary_start|>, etc.) for structured reasoning and summarization.Maintenance & Community
No specific details regarding contributors, sponsorships, community channels (e.g., Discord/Slack), or roadmaps are provided in the README.
Licensing & Compatibility
This project is licensed under the MIT License, which generally permits commercial use and integration into closed-source projects.
Limitations & Caveats
The inference setup requires building a custom vLLM version using an overlay script and specifically depends on vllm==0.13.0. The README does not detail performance benchmarks or specific model compatibility beyond the general approach.
2 weeks ago
Inactive
mlc-ai
algorithmicsuperintelligence