Discover and explore top open-source AI tools and projects—updated daily.
Ontos-AIDocument memory infrastructure for AI agents
New!
Top 57.4% on SourcePulse
Knowhere provides a memory layer for AI agents and RAG systems, transforming unstructured documents into persistent, navigable context. It addresses the challenge of preparing diverse data formats (PDFs, Office, images, text) for AI by parsing, extracting hierarchical structure, and constructing knowledge graphs. This enables more accurate, efficient, and semantically rich retrieval for LLM workflows, benefiting developers and researchers building advanced AI applications.
How It Works
Knowhere operates in two steps: parsing documents to build memory, and enabling agents to retrieve from it. The "Parse and Build Memory" step utilizes specialized parsers for various file types. Its proprietary tree-like algorithm reconstructs the full document hierarchy, preventing semantic fragmentation, and stores chunks, navigation trees, summaries, and graph links. The "Agentic Retrieval" step fuses multiple signals (keyword, path, semantic) and allows agents to navigate the document's section tree and cross-document graph, drilling into relevant regions for traceable, contextualized evidence. This approach offers a significant advantage over traditional flat vector lookups by mimicking human reading patterns.
Quick Start & Requirements
uv for dependency synchronization (uv sync --all-packages), copy environment examples (cp apps/api/.env.example apps/api/.env), start the local development stack (./deploy/local-dev/start-dev.sh), and run the API (cd apps/api && uv run main.py) and worker (cd apps/worker && uv run worker.py) in separate terminals.Highlighted Details
Maintenance & Community
Knowhere was open-sourced on May 7, 2026. Communication channels include GitHub Discussions for general conversation and GitHub Issues for bug reports and feature requests. A Contribution Guide is available, encouraging community involvement.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and linking within closed-source applications.
Limitations & Caveats
Support for formats such as .epub, .html, .xml, .mp4, and .mp3 is listed as "Coming Soon". The project is actively expanding benchmarks and adding parsers, indicating ongoing development.
14 hours ago
Inactive