PurpleLlama by meta-llama

LLM security toolkit for assessing/improving generative AI models

Created 2 years ago

3,968 stars

Top 12.2% on SourcePulse

View on GitHub

7 Experts Love This Project

Dan Guido

Cofounder of Trail of Bits

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Eugene Yan

AI Scientist at AWS

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

and 3 more!

Project Summary

Purple Llama is an umbrella project providing tools and evaluations to enhance the security and responsible development of open generative AI models. It targets developers and researchers seeking to mitigate risks associated with LLMs, offering both offensive (red team) and defensive (blue team) capabilities for comprehensive security assessment.

How It Works

The project employs a "purple teaming" approach, combining red and blue team strategies to identify and address generative AI risks. Key components include Llama Guard for input/output moderation, Prompt Guard against prompt injection and jailbreaking, and Code Shield for filtering insecure LLM-generated code. These safeguards are built upon Meta's Llama models, with Llama Guard 3 specifically fine-tuned for hazard detection and cyberattack response mitigation.

Quick Start & Requirements

Integration with the Llama reference system is recommended.
Resources for safeguards are available in the Llama-recipe GitHub repository.
Specific model requirements (e.g., Llama 3.2 Community License) and dependencies are detailed within component documentation.

Highlighted Details

CyberSec Eval: A suite of benchmarks (v1, v2, v3) assessing LLM cybersecurity risks, including insecure code suggestions, malicious code generation, code interpreter abuse, prompt injection, and offensive cyber capabilities.
Llama Guard 3: Optimized for detecting MLCommons standard hazards, supporting 7 new languages, a 128k context window, and image reasoning, with a focus on preventing malicious code execution.
Prompt Guard: Specifically designed to counter prompt injection and jailbreaking attacks, ensuring LLM application security.
Code Shield: Provides inference-time filtering for insecure code, mitigating risks of code interpreter abuse and insecure command execution.

Maintenance & Community

The project encourages community contributions via the CONTRIBUTING.md file.
Further information and FAQs are available on the Meta AI Llama FAQ.

Licensing & Compatibility

Evals/Benchmarks are MIT licensed.
Safeguards (Llama Guard, Prompt Guard) use the corresponding Llama Community Licenses (Llama 2, Llama 3, Llama 3.2).
Code Shield is MIT licensed.
Permissive licensing enables research and commercial usage.

Limitations & Caveats

The project is an evolving umbrella initiative with components being added over time. Specific model versions and their associated licenses should be carefully reviewed for compatibility.

Health Check

Last Commit

2 days ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

49 stars in the last 30 days