llm_wiki  by nashsu

A desktop app for building self-updating LLM-powered wikis from documents

Created 3 days ago

New!

621 stars

Top 53.0% on SourcePulse

GitHubView on GitHub
Project Summary

LLM Wiki is a cross-platform desktop application designed to transform user documents into an organized, interlinked, and persistent knowledge base using Large Language Models (LLMs). It diverges from traditional Retrieval-Augmented Generation (RAG) by incrementally building and maintaining a wiki, ensuring knowledge is compiled once and kept current rather than re-derived on each query. This offers users a powerful tool for managing and exploring complex information, enhancing research, reading, and personal growth workflows.

How It Works

The project implements a methodology inspired by Andrej Karpathy's LLM Wiki pattern, featuring a three-layer architecture (Raw Sources, Wiki, Schema) and core operations (Ingest, Query, Lint). A key innovation is the Two-Step Chain-of-Thought Ingest: an LLM first analyzes source documents to identify entities, concepts, and connections, then generates wiki pages based on this analysis. This sequential approach improves output quality over single-step methods. The system builds a Knowledge Graph using a 4-signal relevance model (direct link, source overlap, Adamic-Adar, type affinity) and employs Louvain Community Detection to automatically cluster related knowledge, providing insights into surprising connections and potential knowledge gaps.

Quick Start & Requirements

  • Primary install: Download pre-built binaries from the Releases page (macOS .dmg, Windows .msi, Linux .deb/.AppImage).
  • Build from Source: Requires Node.js 20+ and Rust 1.70+. Clone the repository, run npm install, then npm run tauri dev (development) or npm run tauri build (production).
  • Prerequisites: LLM provider configuration (OpenAI, Anthropic, Google, Ollama, Custom) with API keys is essential. Tavily API is used for web search in Deep Research features.
  • Chrome Extension: A companion browser extension is available; enable "Developer mode" in Chrome and load the unpacked extension directory.
  • Links: GitHub Releases (implied by README).

Highlighted Details

  • Two-Step Chain-of-Thought Ingest: Enhances quality with separate analysis and generation steps, featuring SHA256 incremental caching and source traceability via YAML frontmatter.
  • Knowledge Graph & Community Detection: Visualizes knowledge connections with a 4-signal relevance model and uses Louvain algorithm for automatic knowledge cluster discovery and cohesion scoring.
  • Deep Research & Graph Insights: Automatically surfaces "surprising connections" and "knowledge gaps" within the graph, enabling LLM-optimized web research triggered by graph analysis.
  • Multi-Format Document Support: Ingests PDF, DOCX, PPTX, XLSX, images, video, audio, and web content (via Chrome Clipper using Readability.js and Turndown.js).
  • Optimized Query Retrieval: Employs a 4-phase pipeline (Tokenized Search, Graph Expansion, Budget Control, Context Assembly) for efficient information retrieval.
  • Cross-Platform Desktop App: Built with Tauri v2, offering a native experience on macOS, Windows, and Linux with features like multi-conversation chat persistence and configurable context windows.

Maintenance & Community

The README does not specify notable contributors, sponsorships, or community channels (e.g., Discord, Slack). The project is noted as being based on Andrej Karpathy's foundational methodology.

Licensing & Compatibility

This project is licensed under the GNU General Public License v3.0 (GPL-3.0). As a strong copyleft license, GPL-3.0 requires derivative works to be distributed under the same license, potentially restricting commercial use or integration into closed-source proprietary software.

Limitations & Caveats

The application's functionality is dependent on external LLM APIs, which may incur costs and introduce latency. The effectiveness of knowledge extraction and organization relies on the quality of the chosen LLM. The Deep Research feature requires integration with the Tavily API, subject to its own usage policies and potential costs. Building from source requires specific development environment configurations (Node.js, Rust).

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
7
Star History
633 stars in the last 3 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Casper Hansen Casper Hansen(Author of AutoAWQ), and
8 more.

storm by stanford-oval

0.1%
28k
LLM system for automated knowledge curation and article generation
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.