MiroThinker  by MiroMindAI

Agentic models for deep research and complex problem-solving

Created 8 months ago
8,141 stars

Top 6.3% on SourcePulse

GitHubView on GitHub
Project Summary

MiroThinker is an open-source series of agentic large language models designed for deep research and complex, long-horizon problem-solving. Built upon the Qwen3 architecture, it offers models in 8B, 14B, and 32B parameter sizes, with both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) variants. MiroThinker excels in tasks requiring tool use, code execution, web browsing, and document processing, demonstrating state-of-the-art performance among open-source models on benchmarks like GAIA.

How It Works

MiroThinker integrates advanced capabilities such as task decomposition, multi-hop reasoning, and retrieval-augmented generation. It leverages the MiroFlow framework, which provides a robust environment for agent development, featuring enhanced conversation management, flexible tool integration (supporting both open-source and commercial tools), and comprehensive benchmark evaluations. The models are trained on the MiroVerse dataset and utilize the MiroTrain framework for efficient training.

Quick Start & Requirements

Highlighted Details

  • Achieves state-of-the-art performance on the GAIA benchmark among open-source models.
  • Offers SFT and DPO variants across 8B, 14B, and 32B parameter scales.
  • Supports both open-source and commercial tools for enhanced capabilities.
  • Includes a comprehensive benchmark suite covering GAIA, HLE, BrowseComp, and more.

Maintenance & Community

  • Community: Discord server available (https://discord.com/invite/GPqEnkzQZd).
  • Updates: Recent updates include light-weight deployment options and the release of v0.1 models, framework, and data.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The model's Chinese language capabilities are currently limited due to the predominantly English nature of the MiroVerse-v0.1 dataset, with plans to improve this in future versions. Performance metrics are reported using both "Best Pass@1" and "Pass@1 (Avg@8)" for stability and peak performance comparison.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
15
Issues (30d)
16
Star History
1,915 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
9 more.

Qwen-Agent by QwenLM

0.5%
16k
Agent framework for LLM application development
Created 2 years ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Abubakar Abid Abubakar Abid(Cofounder of Gradio), and
3 more.

owl by camel-ai

1.7%
20k
Multi-agent framework for real-world task automation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.