OpenKB  by VectifyAI

Open-source LLM knowledge base system

Created 3 weeks ago

New!

491 stars

Top 62.8% on SourcePulse

GitHubView on GitHub
Project Summary

OpenKB is an open-source system that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs, powered by PageIndex for vectorless long document retrieval. It addresses the challenge of knowledge accumulation by creating a persistent, evolving wiki rather than re-deriving information on each query. This system is designed for engineers, researchers, and power users who need to manage and query extensive document collections efficiently, offering benefits like automatic cross-referencing, contradiction flagging, and multi-modal understanding.

How It Works

OpenKB employs a novel approach that diverges from traditional RAG by compiling knowledge into a persistent wiki. Raw documents are processed; short documents are converted to Markdown and read directly by the LLM. For long PDFs (20+ pages), the PageIndex system creates a hierarchical tree index with summaries, enabling vectorless, reasoning-based retrieval without relying on traditional vector databases. An LLM then compiles these summaries and indexed document trees into a structured wiki, generating concept pages and cross-links. This compilation process ensures that each new document enriches the existing knowledge base, leading to compounding knowledge and a more synthesized understanding over time.

Quick Start & Requirements

Installation is straightforward via pip: pip install openkb. Alternatively, install the latest version from GitHub or from source for development. OpenKB requires access to a Large Language Model (LLM) via LiteLLM, supporting providers like OpenAI, Anthropic, and Gemini. Users must configure their chosen model and set their LLM API key in a .env file. The quick start involves initializing a knowledge base (openkb init), adding documents (openkb add <file_or_dir>), and then querying (openkb query "question") or chatting (openkb chat).

Highlighted Details

  • Broad Format Support: Ingests and processes a wide array of document types including PDF, Word, Markdown, PowerPoint, HTML, Excel, and plain text.
  • Vectorless Long Document Retrieval: Utilizes PageIndex to handle long and complex documents efficiently without traditional vector embeddings.
  • Native Multi-Modality: Capable of retrieving and understanding information from figures, tables, and images within documents, not just text.
  • Obsidian Compatibility: Generates a wiki composed of plain Markdown files with [[wikilinks]], allowing seamless integration with Obsidian for graph visualization and browsing.
  • Automated Updates: Features a 'watch' mode that automatically compiles new files dropped into the raw/ directory.
  • Knowledge Linting: Includes a 'lint' command to perform health checks, identifying contradictions, knowledge gaps, orphaned content, and stale information within the wiki.

Maintenance & Community

The project is associated with VectifyAI and PageIndexAI, with links provided to their respective X (Twitter) and LinkedIn profiles for updates and engagement. A roadmap is outlined in the README detailing future development plans. Contributions are welcomed via pull requests or issue reports.

Licensing & Compatibility

OpenKB is licensed under the Apache 2.0 license. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Current development focuses on CLI-based interaction, with a web UI planned for future releases. While long document handling is robust for PDFs via PageIndex, extending this capability to other formats is listed as a future roadmap item. Scaling to extremely large document collections and implementing hierarchical concept indexing are also areas slated for future enhancement. The optional PageIndex Cloud offers advanced features like OCR for scanned PDFs, but requires an API key.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
23
Issues (30d)
5
Star History
495 stars in the last 22 days

Explore Similar Projects

Feedback? Help us improve.