Discover and explore top open-source AI tools and projects—updated daily.
VectifyAIOpen-source LLM knowledge base system
New!
Top 62.8% on SourcePulse
OpenKB is an open-source system that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs, powered by PageIndex for vectorless long document retrieval. It addresses the challenge of knowledge accumulation by creating a persistent, evolving wiki rather than re-deriving information on each query. This system is designed for engineers, researchers, and power users who need to manage and query extensive document collections efficiently, offering benefits like automatic cross-referencing, contradiction flagging, and multi-modal understanding.
How It Works
OpenKB employs a novel approach that diverges from traditional RAG by compiling knowledge into a persistent wiki. Raw documents are processed; short documents are converted to Markdown and read directly by the LLM. For long PDFs (20+ pages), the PageIndex system creates a hierarchical tree index with summaries, enabling vectorless, reasoning-based retrieval without relying on traditional vector databases. An LLM then compiles these summaries and indexed document trees into a structured wiki, generating concept pages and cross-links. This compilation process ensures that each new document enriches the existing knowledge base, leading to compounding knowledge and a more synthesized understanding over time.
Quick Start & Requirements
Installation is straightforward via pip: pip install openkb. Alternatively, install the latest version from GitHub or from source for development. OpenKB requires access to a Large Language Model (LLM) via LiteLLM, supporting providers like OpenAI, Anthropic, and Gemini. Users must configure their chosen model and set their LLM API key in a .env file. The quick start involves initializing a knowledge base (openkb init), adding documents (openkb add <file_or_dir>), and then querying (openkb query "question") or chatting (openkb chat).
Highlighted Details
[[wikilinks]], allowing seamless integration with Obsidian for graph visualization and browsing.raw/ directory.Maintenance & Community
The project is associated with VectifyAI and PageIndexAI, with links provided to their respective X (Twitter) and LinkedIn profiles for updates and engagement. A roadmap is outlined in the README detailing future development plans. Contributions are welcomed via pull requests or issue reports.
Licensing & Compatibility
OpenKB is licensed under the Apache 2.0 license. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
Current development focuses on CLI-based interaction, with a web UI planned for future releases. While long document handling is robust for PDFs via PageIndex, extending this capability to other formats is listed as a future roadmap item. Scaling to extremely large document collections and implementing hierarchical concept indexing are also areas slated for future enhancement. The optional PageIndex Cloud offers advanced features like OCR for scanned PDFs, but requires an API key.
1 week ago
Inactive