Mulberry  by HJYao00

MLLM research paper for reasoning/reflection via collective Monte Carlo Tree Search

created 7 months ago
1,208 stars

Top 33.1% on sourcepulse

GitHubView on GitHub
Project Summary

Mulberry is an open-source project that enhances multimodal large language models (MLLMs) with advanced reasoning and reflection capabilities. It targets researchers and developers looking to improve MLLM performance on complex tasks requiring step-by-step problem-solving, offering a novel approach to generate and leverage reasoning data.

How It Works

Mulberry employs Collective Monte Carlo Tree Search (CoMCTS) to generate step-by-step reasoning and reflection data. CoMCTS leverages the collective knowledge of multiple LLMs to collaboratively explore, identify, and refine effective reasoning paths. This iterative process, involving expansion, simulation, error positioning, backpropagation, and selection, aims to improve the success rate and efficiency of reasoning path searches, ultimately leading to better model performance.

Quick Start & Requirements

  • Install/Run: Use provided Python scripts for inference (infer.py), data construction (data_construction.py), and training (via LLaMA-Factory).
  • Prerequisites: Python, LLaMA-Factory, VLMEvalKit. Specific model requirements depend on the chosen base model (e.g., LLaMA, Qwen2-VL).
  • Resources: Requires significant computational resources for training and potentially for running larger models.
  • Links: CoMCTS Code, LLaMA-Factory, VLMEvalKit.

Highlighted Details

  • Provides inference code for models like Mulberry_llama_11b and Mulberry_qwen2vl_7b, outputting detailed reasoning steps.
  • Releases a 260K step-by-step reasoning SFT dataset and associated training code.
  • Offers evaluation instructions and code using VLMEvalKit for benchmarking.
  • Demonstrates performance improvements over state-of-the-art models on benchmarks like MathVista.

Maintenance & Community

The project is primarily associated with authors from Nanyang Technological University, Tsinghua University, Baidu, and SYSU. Recent updates include the release of evaluation code, models (Mulberry_llama_11b, Mulberry_qwen2vl_7b), and reasoning data.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The project acknowledges that hallucinations in intermediate reasoning steps can still occur, even with error detection mechanisms. Smaller models used for error localization may be less effective, and larger models can sometimes exhibit inaccurate localization. Ensuring the correctness of all intermediate steps is noted as a significant challenge requiring costly human verification.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
38 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.