aegis  by automorphic-ai

LLM firewall for adversarial attack protection

created 2 years ago
265 stars

Top 97.2% on sourcepulse

GitHubView on GitHub
Project Summary

Aegis provides a self-hardening firewall to protect large language models (LLMs) from adversarial attacks like prompt injection, PII leakage, and toxic language. It is designed for developers and researchers working with LLMs who need to secure their applications and users.

How It Works

Aegis employs a classification model trained on a diverse dataset of prompt injection and leakage attacks. This model, combined with traditional firewall heuristics, analyzes both incoming prompts and outgoing model responses to identify malicious activity. A key feature is its self-hardening capability, allowing it to learn from observed attacks and improve its detection over time.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/automorphic-ai/aegis.git
  • Requires an API key from automorphic.ai.
  • See the playground for experimentation: automorphic.ai

Highlighted Details

  • Detects prompt injection, toxic language, and PII leakage.
  • Features attack signature learning for continuous improvement.
  • Offers a bug bounty of $100 for successful firewall breaches.

Maintenance & Community

  • Active development with a roadmap including honey prompt generation.
  • Community channels available via Discord and email.
  • Updates shared on Twitter.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with features like honey prompt generation still on the roadmap. The license and its implications for commercial use are not clearly defined in the provided README.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Michele Castata Michele Castata(President of Replit), and
2 more.

rebuff by protectai

0.4%
1k
SDK for LLM prompt injection detection
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

llm-security by greshake

0.2%
2k
Research paper on indirect prompt injection attacks targeting app-integrated LLMs
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.