LLM firewall for adversarial attack protection
Top 97.2% on sourcepulse
Aegis provides a self-hardening firewall to protect large language models (LLMs) from adversarial attacks like prompt injection, PII leakage, and toxic language. It is designed for developers and researchers working with LLMs who need to secure their applications and users.
How It Works
Aegis employs a classification model trained on a diverse dataset of prompt injection and leakage attacks. This model, combined with traditional firewall heuristics, analyzes both incoming prompts and outgoing model responses to identify malicious activity. A key feature is its self-hardening capability, allowing it to learn from observed attacks and improve its detection over time.
Quick Start & Requirements
pip install git+https://github.com/automorphic-ai/aegis.git
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is in active development, with features like honey prompt generation still on the roadmap. The license and its implications for commercial use are not clearly defined in the provided README.
1 year ago
Inactive