swarmvault  by swarmclawai

Local-first RAG knowledge compiler and wiki builder

Created 2 weeks ago

New!

263 stars

Top 96.7% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> SwarmVault is a local-first knowledge compiler inspired by Andrej Karpathy's LLM Wiki pattern. It transforms raw research, documents, and code into a persistent, compounding markdown wiki and knowledge graph, managed by LLMs. It targets researchers, developers, and power users seeking to build durable, interconnected knowledge bases that go beyond ephemeral Q&A, offering a local-first, privacy-preserving approach.

How It Works

SwarmVault employs a three-layer architecture: immutable raw sources, an LLM-generated and human-authored markdown wiki, and a co-evolved schema (swarmvault.schema.md) defining domain conventions. LLMs handle the maintenance of connections and summaries within the wiki, treating inter-source relationships as valuable as the sources themselves, akin to Vannevar Bush's Memex but with automated upkeep. This approach ensures knowledge compounds over time, with provenance-tracked edges and automatic contradiction detection.

Quick Start & Requirements

  • Installation: npm install -g @swarmvaultai/cli (requires Node.js >= 24). A desktop app is also available for macOS, Windows, and Linux.
  • Basic Usage: swarmvault scan ./your-repo to process a local directory, or swarmvault demo for a quick start with sample data.
  • Prerequisites: Node.js >= 24. Optional: Ollama for local LLM inference (e.g., Gemma) for enhanced extraction.
  • Links: Website: https://www.swarmvault.ai/, Docs: https://www.swarmvault.ai/docs.

Highlighted Details

  • Knowledge Graph: Generates a typed knowledge graph with provenance-tracked edges, identifying god nodes and communities.
  • Contradiction Detection: Automatically flags conflicting claims across sources and offers lint --conflicts for audits.
  • Hybrid Search: Merges SQLite full-text search with semantic embeddings for efficient querying.
  • Extensive Input Support: Handles over 30 input formats, including code (via tree-sitter AST), documents (PDF, DOCX), audio (via local Whisper), and web content.
  • Agent Integrations: Supports 16 agent integrations (e.g., Claude Code, Codex, Copilot) with optional graph-first hooks.
  • Offline First: Built-in heuristic provider runs fully offline without API keys; local LLMs via Ollama are recommended for sharper results.

Maintenance & Community

The project is hosted on GitHub (swarmclawai/swarmvault) with issues tracking and a dedicated website and documentation portal. Specific contributor or community channel details (like Discord/Slack) are not explicitly highlighted in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The local-whisper provider for audio transcription is marked as experimental. While stable surfaces follow semantic versioning post-1.0.0, experimental features may change in minor releases. The project emphasizes a local-first approach, requiring local setup for full offline functionality.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
6
Star History
265 stars in the last 19 days

Explore Similar Projects

Feedback? Help us improve.