llm-wiki-compiler  by atomicmemory

Knowledge compiler for persistent, interlinked wikis from raw sources

Created 6 days ago

New!

291 stars

Top 90.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project compiles raw text sources into an interlinked Markdown wiki, inspired by the LLM Wiki pattern. It addresses the issue of knowledge being lost or re-discovered at query time by creating a persistent, browsable artifact that compounds over time. Aimed at AI researchers, engineers building knowledge bases, and technical writers, it offers a way to build a structured, evolving knowledge base that complements traditional RAG approaches.

How It Works

The system employs a two-phase pipeline. Phase 1 ingests sources, performs SHA-256 hash checks for incremental updates, and uses an LLM for concept extraction. Phase 2 generates wiki pages, resolves [[wikilink]]s, and creates an index.md. This approach eliminates order dependence, catches failures early, merges shared concepts, and ensures only changed sources are re-processed by the LLM. Queries can be saved (--save), with their answers becoming new wiki pages that enrich the knowledge base for future queries.

Quick Start & Requirements

  • Install: npm install -g llm-wiki-compiler
  • Prerequisites: Node.js >= 18, an Anthropic API key (set via export ANTHROPIC_API_KEY=sk-...).
  • Usage: llmwiki ingest <url|file>, llmwiki compile, llmwiki query "question" [--save].
  • Examples: See examples/basic/ in the repository for pre-generated output.

Highlighted Details

  • Compounding Queries: Answers to queries with the --save flag are added as new wiki pages, enhancing the knowledge base for subsequent queries.
  • Incremental Compilation: Utilizes hash-based change detection to efficiently re-process only modified source files.
  • Obsidian Compatibility: Generates [[wikilinks]] that resolve to concept titles, making the output compatible with Obsidian.
  • Source Attribution: Includes YAML frontmatter and ^[filename.md] markers to link generated content back to its original sources.

Maintenance & Community

The project welcomes issues and PRs. A roadmap includes planned enhancements such as improved provenance, linting, multi-provider LLM support, semantic search, and agent integration. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The project is licensed under the MIT license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

This is early-stage software, best suited for small, high-signal corpora (up to a few dozen sources). It currently supports only Anthropic models. Sources exceeding token limits are truncated during ingest, with indicators provided in the frontmatter. Image support and Marp slide generation are not yet implemented.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
1
Star History
310 stars in the last 6 days

Explore Similar Projects

Feedback? Help us improve.