Mulberry by HJYao00

MLLM research paper for reasoning/reflection via collective Monte Carlo Tree Search

Created 1 year ago

1,232 stars

Top 31.9% on SourcePulse

Project Summary

Mulberry is an open-source project that enhances multimodal large language models (MLLMs) with advanced reasoning and reflection capabilities. It targets researchers and developers looking to improve MLLM performance on complex tasks requiring step-by-step problem-solving, offering a novel approach to generate and leverage reasoning data.

How It Works

Mulberry employs Collective Monte Carlo Tree Search (CoMCTS) to generate step-by-step reasoning and reflection data. CoMCTS leverages the collective knowledge of multiple LLMs to collaboratively explore, identify, and refine effective reasoning paths. This iterative process, involving expansion, simulation, error positioning, backpropagation, and selection, aims to improve the success rate and efficiency of reasoning path searches, ultimately leading to better model performance.

Quick Start & Requirements

Install/Run: Use provided Python scripts for inference (infer.py), data construction (data_construction.py), and training (via LLaMA-Factory).
Prerequisites: Python, LLaMA-Factory, VLMEvalKit. Specific model requirements depend on the chosen base model (e.g., LLaMA, Qwen2-VL).
Resources: Requires significant computational resources for training and potentially for running larger models.
Links: CoMCTS Code, LLaMA-Factory, VLMEvalKit.

Highlighted Details

Provides inference code for models like Mulberry_llama_11b and Mulberry_qwen2vl_7b, outputting detailed reasoning steps.
Releases a 260K step-by-step reasoning SFT dataset and associated training code.
Offers evaluation instructions and code using VLMEvalKit for benchmarking.
Demonstrates performance improvements over state-of-the-art models on benchmarks like MathVista.

Maintenance & Community

The project is primarily associated with authors from Nanyang Technological University, Tsinghua University, Baidu, and SYSU. Recent updates include the release of evaluation code, models (Mulberry_llama_11b, Mulberry_qwen2vl_7b), and reasoning data.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The project acknowledges that hallucinations in intermediate reasoning steps can still occur, even with error detection mechanisms. Smaller models used for error localization may be less effective, and larger models can sometimes exhibit inaccurate localization. Ensuring the correctness of all intermediate steps is noted as a significant challenge requiring costly human verification.

Mulberry by HJYao00

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

Vision-R1 by Osilly

Husky-v1 by agent-husky

Awesome-Efficient-Reasoning-LLMs by Eclipsess

ReasonFlux by Gen-Verse

ReasonGraph by ZongqianLi

Awesome-System2-Reasoning-LLM by zzli2022

LLaVA-CoT by PKU-YuanGroup

rStar by zhentingqi

train-deepseek-r1 by FareedKhan-dev

HuatuoGPT-o1 by FreedomIntelligence

optillm by algorithmicsuperintelligence