prompt-injection-defenses by tldrsec

Collection of prompt injection defenses

Created 1 year ago

574 stars

Top 56.2% on SourcePulse

Project Summary

This repository serves as a comprehensive catalog of practical and proposed defenses against prompt injection attacks targeting Large Language Models (LLMs). It is intended for security researchers, LLM developers, and engineers seeking to understand and implement robust security measures for LLM-powered applications. The primary benefit is a centralized, categorized collection of mitigation strategies, research papers, and tools.

How It Works

The repository categorizes defenses into several key areas: Blast Radius Reduction, Input Pre-processing, Guardrails & Overseers, Taint Tracking, Secure Threads/Dual LLM, Ensemble Decisions, Prompt Engineering/Instructional Defense, Robustness/Finetuning, and Preflight Injection Tests. Each category details specific techniques, often referencing academic papers or practical implementations, to explain how they work and their theoretical underpinnings. For example, Input Pre-processing includes methods like paraphrasing and retokenization to disrupt adversarial prompts, while Guardrails employ input/output filtering and monitoring.

Quick Start & Requirements

This repository is a curated collection of information and does not have a direct installation or execution command. It requires a web browser to access and read the README. Links to external tools and papers are provided for further investigation.

Highlighted Details

Comprehensive categorization of over a dozen distinct defense strategies.
Extensive list of references to academic papers and security blogs.
Inclusion of specific tools like Llama Guard, NeMo Guardrails, and Rebuff.
Discussion of critiques and limitations of various defense mechanisms.

Maintenance & Community

The repository is maintained by tldrsec. Community engagement and further contributions are encouraged through GitHub. Specific community links (Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The repository itself appears to be under an unspecified license, but it aggregates information and links to various tools and papers, each with their own licenses. Users must consult the licenses of individual referenced components for compatibility and usage restrictions.

Limitations & Caveats

The repository is a survey of existing and proposed defenses, not a ready-to-deploy solution. The effectiveness of individual defenses can vary significantly based on the LLM, attack vector, and implementation details. Some techniques are still in the research phase and may not be production-ready.

prompt-injection-defenses by tldrsec

Explore Similar Projects

aegis by automorphic-ai

prompt-hacker-collections by yunwei37

maldev-links by CodeXTF2

awesome-prompt-injection by Joe-B-Security

PromptInject by agencyenterprise

Open-Prompt-Injection by liu00222

PIPE by jthack

ps-fuzz by prompt-security

rebuff by protectai

damn-vulnerable-MCP-server by harishsg993010

llm-guard by protectai

PurpleLlama by meta-llama