llm-wiki-compiler by atomicstrata

Knowledge compiler for persistent, interlinked wikis from raw sources

Created 3 months ago

1,721 stars

Top 23.9% on SourcePulse

Project Summary

This project compiles raw text sources into an interlinked Markdown wiki, inspired by the LLM Wiki pattern. It addresses the issue of knowledge being lost or re-discovered at query time by creating a persistent, browsable artifact that compounds over time. Aimed at AI researchers, engineers building knowledge bases, and technical writers, it offers a way to build a structured, evolving knowledge base that complements traditional RAG approaches.

How It Works

The system employs a two-phase pipeline. Phase 1 ingests sources, performs SHA-256 hash checks for incremental updates, and uses an LLM for concept extraction. Phase 2 generates wiki pages, resolves [[wikilink]]s, and creates an index.md. This approach eliminates order dependence, catches failures early, merges shared concepts, and ensures only changed sources are re-processed by the LLM. Queries can be saved (--save), with their answers becoming new wiki pages that enrich the knowledge base for future queries.

Quick Start & Requirements

Install: npm install -g llm-wiki-compiler
Prerequisites: Node.js >= 18, an Anthropic API key (set via export ANTHROPIC_API_KEY=sk-...).
Usage: llmwiki ingest <url|file>, llmwiki compile, llmwiki query "question" [--save].
Examples: See examples/basic/ in the repository for pre-generated output.

Highlighted Details

Compounding Queries: Answers to queries with the --save flag are added as new wiki pages, enhancing the knowledge base for subsequent queries.
Incremental Compilation: Utilizes hash-based change detection to efficiently re-process only modified source files.
Obsidian Compatibility: Generates [[wikilinks]] that resolve to concept titles, making the output compatible with Obsidian.
Source Attribution: Includes YAML frontmatter and ^[filename.md] markers to link generated content back to its original sources.

Maintenance & Community

The project welcomes issues and PRs. A roadmap includes planned enhancements such as improved provenance, linting, multi-provider LLM support, semantic search, and agent integration. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The project is licensed under the MIT license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

This is early-stage software, best suited for small, high-signal corpora (up to a few dozen sources). It currently supports only Anthropic models. Sources exceeding token limits are truncated during ingest, with indicators provided in the frontmatter. Image support and Marp slide generation are not yet implemented.

Health Check

Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

240 stars in the last 30 days