aegis by automorphic-ai

LLM firewall for adversarial attack protection

Created 2 years ago

267 stars

Top 96.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

Aegis provides a self-hardening firewall to protect large language models (LLMs) from adversarial attacks like prompt injection, PII leakage, and toxic language. It is designed for developers and researchers working with LLMs who need to secure their applications and users.

How It Works

Aegis employs a classification model trained on a diverse dataset of prompt injection and leakage attacks. This model, combined with traditional firewall heuristics, analyzes both incoming prompts and outgoing model responses to identify malicious activity. A key feature is its self-hardening capability, allowing it to learn from observed attacks and improve its detection over time.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/automorphic-ai/aegis.git
Requires an API key from automorphic.ai.
See the playground for experimentation: automorphic.ai

Highlighted Details

Detects prompt injection, toxic language, and PII leakage.
Features attack signature learning for continuous improvement.
Offers a bug bounty of $100 for successful firewall breaches.

Maintenance & Community

Active development with a roadmap including honey prompt generation.
Community channels available via Discord and email.
Updates shared on Twitter.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with features like honey prompt generation still on the roadmap. The license and its implications for commercial use are not clearly defined in the provided README.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days