prompt-injection-defenses  by tldrsec

Collection of prompt injection defenses

created 1 year ago
502 stars

Top 62.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive catalog of practical and proposed defenses against prompt injection attacks targeting Large Language Models (LLMs). It is intended for security researchers, LLM developers, and engineers seeking to understand and implement robust security measures for LLM-powered applications. The primary benefit is a centralized, categorized collection of mitigation strategies, research papers, and tools.

How It Works

The repository categorizes defenses into several key areas: Blast Radius Reduction, Input Pre-processing, Guardrails & Overseers, Taint Tracking, Secure Threads/Dual LLM, Ensemble Decisions, Prompt Engineering/Instructional Defense, Robustness/Finetuning, and Preflight Injection Tests. Each category details specific techniques, often referencing academic papers or practical implementations, to explain how they work and their theoretical underpinnings. For example, Input Pre-processing includes methods like paraphrasing and retokenization to disrupt adversarial prompts, while Guardrails employ input/output filtering and monitoring.

Quick Start & Requirements

This repository is a curated collection of information and does not have a direct installation or execution command. It requires a web browser to access and read the README. Links to external tools and papers are provided for further investigation.

Highlighted Details

  • Comprehensive categorization of over a dozen distinct defense strategies.
  • Extensive list of references to academic papers and security blogs.
  • Inclusion of specific tools like Llama Guard, NeMo Guardrails, and Rebuff.
  • Discussion of critiques and limitations of various defense mechanisms.

Maintenance & Community

The repository is maintained by tldrsec. Community engagement and further contributions are encouraged through GitHub. Specific community links (Discord/Slack) are not explicitly provided in the README.

Licensing & Compatibility

The repository itself appears to be under an unspecified license, but it aggregates information and links to various tools and papers, each with their own licenses. Users must consult the licenses of individual referenced components for compatibility and usage restrictions.

Limitations & Caveats

The repository is a survey of existing and proposed defenses, not a ready-to-deploy solution. The effectiveness of individual defenses can vary significantly based on the LLM, attack vector, and implementation details. Some techniques are still in the research phase and may not be production-ready.

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
61 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.