Discover and explore top open-source AI tools and projects—updated daily.
opendatalabAgent-native knowledge engine for local document analysis and knowledge base creation
Top 62.7% on SourcePulse
MinerU Document Explorer is an agent-native knowledge engine designed to index, organize, and retrieve information from diverse document formats including PDF, DOCX, PPTX, and Markdown. It empowers AI agents and power users by enabling fast, cross-collection search, deep reading within individual documents, and the construction of LLM-maintained knowledge bases. This tool enhances research, project management, and study workflows by providing a comprehensive, locally-run solution for knowledge management.
How It Works
The system operates through three core tool suites: Retrieve, Deep Read, and Ingest. Retrieve offers hybrid search capabilities combining BM25, vector embeddings, LLM reranking, and query expansion. Deep Read allows users to navigate, search, and extract content from specific sections of a single document without loading the entire file. Ingest facilitates the creation and maintenance of LLM wikis, following the established Karpathy LLM Wiki pattern. MinerU integrates with AI agents via the Model Context Protocol (MCP), offering both stdio and HTTP daemon modes for its MCP server, which efficiently keeps LLM models loaded in memory across requests.
Quick Start & Requirements
npm install -g mineru-document-explorer. Agent skills can also be installed using qmd skill install.pymupdf, python-docx, python-pptx. macOS users may need brew install sqlite.MINERU_API_KEY) can be set for enhanced PDF extraction from scanned documents.demo/ folder within the repository.qmd --help or linked documentation.Highlighted Details
Maintenance & Community
The project is a recent rebuild (v1 released April 7, 2026) from an OpenClaw agent skill. It builds upon foundational projects like QMD and Karpathy's LLM Wiki. Specific community channels (e.g., Discord, Slack) or notable maintainer information are not detailed in the provided README.
Licensing & Compatibility
Licensed under the MIT license. This permits commercial use and integration within closed-source projects without significant restrictions.
Limitations & Caveats
While the core functionality operates locally, optimal PDF extraction for scanned or complex documents relies on the optional MinerU Cloud service. As a recently rebuilt project (v1), it may still be undergoing active development and refinement.
6 hours ago
Inactive