PurpleLlama  by meta-llama

LLM security toolkit for assessing/improving generative AI models

created 1 year ago
3,641 stars

Top 13.6% on sourcepulse

GitHubView on GitHub
Project Summary

Purple Llama is an umbrella project providing tools and evaluations to enhance the security and responsible development of open generative AI models. It targets developers and researchers seeking to mitigate risks associated with LLMs, offering both offensive (red team) and defensive (blue team) capabilities for comprehensive security assessment.

How It Works

The project employs a "purple teaming" approach, combining red and blue team strategies to identify and address generative AI risks. Key components include Llama Guard for input/output moderation, Prompt Guard against prompt injection and jailbreaking, and Code Shield for filtering insecure LLM-generated code. These safeguards are built upon Meta's Llama models, with Llama Guard 3 specifically fine-tuned for hazard detection and cyberattack response mitigation.

Quick Start & Requirements

  • Integration with the Llama reference system is recommended.
  • Resources for safeguards are available in the Llama-recipe GitHub repository.
  • Specific model requirements (e.g., Llama 3.2 Community License) and dependencies are detailed within component documentation.

Highlighted Details

  • CyberSec Eval: A suite of benchmarks (v1, v2, v3) assessing LLM cybersecurity risks, including insecure code suggestions, malicious code generation, code interpreter abuse, prompt injection, and offensive cyber capabilities.
  • Llama Guard 3: Optimized for detecting MLCommons standard hazards, supporting 7 new languages, a 128k context window, and image reasoning, with a focus on preventing malicious code execution.
  • Prompt Guard: Specifically designed to counter prompt injection and jailbreaking attacks, ensuring LLM application security.
  • Code Shield: Provides inference-time filtering for insecure code, mitigating risks of code interpreter abuse and insecure command execution.

Maintenance & Community

  • The project encourages community contributions via the CONTRIBUTING.md file.
  • Further information and FAQs are available on the Meta AI Llama FAQ.

Licensing & Compatibility

  • Evals/Benchmarks are MIT licensed.
  • Safeguards (Llama Guard, Prompt Guard) use the corresponding Llama Community Licenses (Llama 2, Llama 3, Llama 3.2).
  • Code Shield is MIT licensed.
  • Permissive licensing enables research and commercial usage.

Limitations & Caveats

The project is an evolving umbrella initiative with components being added over time. Specific model versions and their associated licenses should be carefully reviewed for compatibility.

Health Check
Last commit

1 week ago

Responsiveness

1+ week

Pull Requests (30d)
1
Issues (30d)
3
Star History
461 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.