Marco-o1  by AIDC-AI

Open reasoning model for real-world problem solving

created 9 months ago
1,509 stars

Top 27.9% on sourcepulse

GitHubView on GitHub
Project Summary

Marco-o1 is an open-source large reasoning model (LRM) designed to tackle complex, real-world problems, particularly those with open-ended solutions where rewards are difficult to quantify. It targets researchers and developers aiming to advance LLM reasoning capabilities beyond standard benchmarks, offering a foundation for exploring novel problem-solving strategies.

How It Works

Marco-o1 enhances reasoning through a combination of Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), and a reflection mechanism. MCTS is integrated to expand the solution space by treating LLM outputs as actions and using token confidence scores to guide the search. The model also employs "mini-step" actions within MCTS for finer granularity and incorporates a self-reflection prompt to improve error detection and correction.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires transformers library.
  • Inference can be run using ./src/talk_with_model.py or ./src/talk_with_model_vllm.py.
  • Official model weights are available on Hugging Face: AIDC-AI/Marco-o1.

Highlighted Details

  • Fine-tuned on a mix of filtered Open-O1 CoT, synthetic Marco-o1 CoT, and instruction datasets.
  • MCTS integration with "step" and "mini-step" (32/64 tokens) actions shows performance gains on MGSM dataset.
  • Demonstrated improved handling of colloquialisms in translation tasks.
  • Future work includes training reward models (ORM, PRM) and applying reinforcement learning.

Maintenance & Community

The project is led by the MarcoPolo Team at Alibaba International Digital Commerce. Key contributors include Yu Zhao and Huifeng Yin. The project is active, with the latest release on 2024/11/13. Further details and discussions can be found on the GitHub repository.

Licensing & Compatibility

Released under the Apache License Version 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The model is explicitly stated to be inspired by OpenAI's o1 and does not yet match its performance. The current MCTS implementation shows significant randomness due to the confidence score reward, and optimal action granularity is problem-dependent. Fine-tuning on English CoT data led to a performance decrease on the Chinese MGSM dataset.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.