MiroThinker  by MiroMindAI

Agentic models for deep research and complex problem-solving

Created 1 month ago
315 stars

Top 85.6% on SourcePulse

GitHubView on GitHub
Project Summary

MiroThinker is an open-source series of agentic large language models designed for deep research and complex, long-horizon problem-solving. Built upon the Qwen3 architecture, it offers models in 8B, 14B, and 32B parameter sizes, with both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) variants. MiroThinker excels in tasks requiring tool use, code execution, web browsing, and document processing, demonstrating state-of-the-art performance among open-source models on benchmarks like GAIA.

How It Works

MiroThinker integrates advanced capabilities such as task decomposition, multi-hop reasoning, and retrieval-augmented generation. It leverages the MiroFlow framework, which provides a robust environment for agent development, featuring enhanced conversation management, flexible tool integration (supporting both open-source and commercial tools), and comprehensive benchmark evaluations. The models are trained on the MiroVerse dataset and utilize the MiroTrain framework for efficient training.

Quick Start & Requirements

Highlighted Details

  • Achieves state-of-the-art performance on the GAIA benchmark among open-source models.
  • Offers SFT and DPO variants across 8B, 14B, and 32B parameter scales.
  • Supports both open-source and commercial tools for enhanced capabilities.
  • Includes a comprehensive benchmark suite covering GAIA, HLE, BrowseComp, and more.

Maintenance & Community

  • Community: Discord server available (https://discord.com/invite/GPqEnkzQZd).
  • Updates: Recent updates include light-weight deployment options and the release of v0.1 models, framework, and data.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The model's Chinese language capabilities are currently limited due to the predominantly English nature of the MiroVerse-v0.1 dataset, with plans to improve this in future versions. Performance metrics are reported using both "Best Pass@1" and "Pass@1 (Avg@8)" for stability and peak performance comparison.

Health Check
Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)
25
Issues (30d)
2
Star History
78 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
8 more.

Qwen-Agent by QwenLM

1.4%
12k
Agent framework for LLM application development
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.