rebuff by protectai

SDK for LLM prompt injection detection

Created 2 years ago

1,394 stars

Top 28.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Cofounder of Cloudera

and 1 more!

Project Summary

Rebuff is an open-source prompt injection detector designed to protect AI applications from malicious inputs. It targets developers and security professionals seeking to safeguard LLM-based systems. Rebuff offers a multi-layered defense strategy to identify and mitigate prompt injection attacks.

How It Works

Rebuff employs a four-layer defense mechanism: heuristic filtering of suspicious inputs, LLM-based analysis of prompts, a vector database to store and recognize embeddings of known attacks, and canary tokens to detect data leakage. This layered approach aims to provide robust protection by combining signature-based detection with behavioral analysis and proactive monitoring.

Quick Start & Requirements

Install via pip: pip install rebuff
Requires OpenAI API key, Pinecone API key, and Pinecone index name.
Supports OpenAI models (defaults to gpt-3.5-turbo).
Official Docs: https://docs.rebuff.ai
Playground: https://playground.rebuff.ai

Highlighted Details

Detects prompt injection and canary word leakage.
Implements attack signature learning via a vector database.
Offers JavaScript/TypeScript SDK; Python SDK is planned.
Self-hosting requires Supabase, OpenAI, and a vector database (Pinecone/Chroma).

Maintenance & Community

Active development with ongoing roadmap items.
Discord community available: https://discord.gg/R3U2XVNKeE
GitHub Actions for JavaScript and Python tests indicate CI integration.

Licensing & Compatibility

License not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Rebuff is a prototype and does not guarantee 100% protection against prompt injection attacks. A Python SDK is still under development, and features like local-only mode and user-defined detection strategies are planned.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days