vault  by vaultmcp

Runtime security proxy for AI agents

Created 3 weeks ago

New!

614 stars

Top 53.0% on SourcePulse

GitHubView on GitHub
Project Summary

Vault is a production-grade prompt-injection firewall designed to protect AI agents by scanning tool responses before they reach the agent's context. It addresses the critical security vulnerability of malicious LLM outputs, safeguarding against data exfiltration, unauthorized actions, and agent compromise. Targeted at engineers and researchers deploying AI agents in production, Vault offers a robust, layered defense mechanism with minimal impact on agent workflows.

How It Works

Vault employs a three-layer detection pipeline. Layer 1 uses fast regex and heuristic pattern matching for common injection techniques. Layer 2 leverages on-device embedding similarity (bge-small) against a curated corpus of known attacks. Borderline cases are escalated to Layer 3, an LLM judge (e.g., Anthropic Haiku 4.5), which resolves ambiguities. This layered approach provides high detection rates (100% TPR on specific benchmarks) while minimizing LLM costs, as Layer 3 is invoked only for uncertain L2 cases (typically <5% of requests).

Quick Start & Requirements

  • Install/Run: npx @aimcpvault/mcp-proxy (no global install required).
  • Prerequisites:
    • Node.js (v20+ recommended), pnpm (v9+ recommended).
    • Layer 3 LLM: Required for full effectiveness. Options include:
      • Anthropic (e.g., claude-haiku-4-5-20251001) via ANTHROPIC_API_KEY.
      • OpenAI-compatible (e.g., gpt-4o-mini, self-hosted vLLM/llama.cpp) via OPENAI_API_KEY.
      • Ollama (local, air-gapped) via OLLAMA_HOST=http://localhost:11434.
    • Offline Mode: L1+L2 operate without an LLM API key, but with significantly reduced detection capabilities.
  • Resource Footprint: ~135–180 MB steady-state memory (RSS); L1 latency <1ms, L2 ~8ms, L3 ~1s (when invoked).
  • Links: vaultmcp.io, GitHub Repo.

Highlighted Details

  • Achieves 100% True Positive Rate (TPR) and 0.0% False Positive Rate (FPR) on the v2 holdout dataset with Layer 3 enabled.
  • Features a capability firewall for taint tracking, gating sensitive tool calls based on previously seen context.
  • Includes manifest verification to detect drift in MCP server tool definitions, preventing supply-chain attacks.
  • Supports optional on-chain reputation attestation via EAS on Base, building a public trust score for MCP servers.

Maintenance & Community

The project is actively developed, with plans for v0.3 including a continuous attestation feed, public reputation registry, and signed releases. Community interaction is primarily via X (@vaultmcp).

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The permissive MIT license allows for commercial use and integration into closed-source projects. Full functionality requires API keys for supported LLM providers.

Limitations & Caveats

Vault is text-only and does not scan binary content (images, audio, PDFs) within tool responses. Multi-turn attacks spanning different sessions and user-initiated jailbreaks are outside its scope. In offline mode (L1+L2 only), TPR drops significantly, especially for novel, out-of-distribution attacks. Protocol-encoded data without L3 enabled may result in higher false positive rates.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
614 stars in the last 24 days

Explore Similar Projects

Feedback? Help us improve.