Open reasoning model for real-world problem solving
Top 27.9% on sourcepulse
Marco-o1 is an open-source large reasoning model (LRM) designed to tackle complex, real-world problems, particularly those with open-ended solutions where rewards are difficult to quantify. It targets researchers and developers aiming to advance LLM reasoning capabilities beyond standard benchmarks, offering a foundation for exploring novel problem-solving strategies.
How It Works
Marco-o1 enhances reasoning through a combination of Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), and a reflection mechanism. MCTS is integrated to expand the solution space by treating LLM outputs as actions and using token confidence scores to guide the search. The model also employs "mini-step" actions within MCTS for finer granularity and incorporates a self-reflection prompt to improve error detection and correction.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.transformers
library../src/talk_with_model.py
or ./src/talk_with_model_vllm.py
.Highlighted Details
Maintenance & Community
The project is led by the MarcoPolo Team at Alibaba International Digital Commerce. Key contributors include Yu Zhao and Huifeng Yin. The project is active, with the latest release on 2024/11/13. Further details and discussions can be found on the GitHub repository.
Licensing & Compatibility
Released under the Apache License Version 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The model is explicitly stated to be inspired by OpenAI's o1 and does not yet match its performance. The current MCTS implementation shows significant randomness due to the confidence score reward, and optimal action granularity is problem-dependent. Fine-tuning on English CoT data led to a performance decrease on the Chinese MGSM dataset.
2 months ago
1 day