MinerU-Document-Explorer  by opendatalab

Agent-native knowledge engine for local document analysis and knowledge base creation

Created 1 month ago
492 stars

Top 62.7% on SourcePulse

GitHubView on GitHub
Project Summary

MinerU Document Explorer is an agent-native knowledge engine designed to index, organize, and retrieve information from diverse document formats including PDF, DOCX, PPTX, and Markdown. It empowers AI agents and power users by enabling fast, cross-collection search, deep reading within individual documents, and the construction of LLM-maintained knowledge bases. This tool enhances research, project management, and study workflows by providing a comprehensive, locally-run solution for knowledge management.

How It Works

The system operates through three core tool suites: Retrieve, Deep Read, and Ingest. Retrieve offers hybrid search capabilities combining BM25, vector embeddings, LLM reranking, and query expansion. Deep Read allows users to navigate, search, and extract content from specific sections of a single document without loading the entire file. Ingest facilitates the creation and maintenance of LLM wikis, following the established Karpathy LLM Wiki pattern. MinerU integrates with AI agents via the Model Context Protocol (MCP), offering both stdio and HTTP daemon modes for its MCP server, which efficiently keeps LLM models loaded in memory across requests.

Quick Start & Requirements

  • Installation: Install globally via npm: npm install -g mineru-document-explorer. Agent skills can also be installed using qmd skill install.
  • Prerequisites: Node.js >= 22 or Bun runtime. Python >= 3.10 for document processing, with required packages pymupdf, python-docx, python-pptx. macOS users may need brew install sqlite.
  • Optional: A MinerU Cloud API key (MINERU_API_KEY) can be set for enhanced PDF extraction from scanned documents.
  • LLM Models: Core models (embeddinggemma-300M, qwen3-reranker-0.6b, qmd-query-expansion-1.7B) are auto-downloaded on first use for search functionalities.
  • Links:

Highlighted Details

  • Provides 15 tools integrated via the Model Context Protocol (MCP) for seamless AI agent interaction.
  • Features "Deep Reading" for in-document navigation and targeted search without loading entire files.
  • Enables the creation of interlinked wiki knowledge bases from raw documents.
  • Employs a hybrid search pipeline combining traditional (BM25) and modern (vector, LLM reranking) retrieval methods.
  • Designed for 100% local execution, offering a privacy-focused alternative to cloud-based solutions.

Maintenance & Community

The project is a recent rebuild (v1 released April 7, 2026) from an OpenClaw agent skill. It builds upon foundational projects like QMD and Karpathy's LLM Wiki. Specific community channels (e.g., Discord, Slack) or notable maintainer information are not detailed in the provided README.

Licensing & Compatibility

Licensed under the MIT license. This permits commercial use and integration within closed-source projects without significant restrictions.

Limitations & Caveats

While the core functionality operates locally, optimal PDF extraction for scanned or complex documents relies on the optional MinerU Cloud service. As a recently rebuilt project (v1), it may still be undergoing active development and refinement.

Health Check
Last Commit

6 hours ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
7
Star History
498 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.