deepteam  by confident-ai

LLM red teaming framework for security testing

created 5 months ago
570 stars

Top 57.4% on sourcepulse

GitHubView on GitHub
Project Summary

DeepTeam is an open-source LLM red teaming framework designed for penetration testing and safeguarding LLM systems. It targets developers and security professionals seeking to identify and mitigate vulnerabilities like bias, PII leakage, and misinformation in chatbots, RAG pipelines, and AI agents. The framework leverages state-of-the-art adversarial attack techniques and provides guardrails for production deployment.

How It Works

DeepTeam simulates adversarial attacks against an LLM system, defined by a model_callback function. It dynamically generates attacks based on a list of specified vulnerabilities, eliminating the need for pre-defined datasets. LLMs are used for both generating these attacks and evaluating the LLM system's responses against defined vulnerability criteria. This approach allows for flexible, on-the-fly testing tailored to specific organizational needs.

Quick Start & Requirements

  • Installation: pip install -U deepteam
  • Prerequisites: Requires an OpenAI API key (or other supported providers like Anthropic, Gemini, Azure, Ollama) set as an environment variable or via deepteam set-api-key. Custom LLM integration is supported.
  • Setup: Minimal setup, primarily involves defining a model_callback function wrapping the target LLM system.
  • Documentation: LLM Red Teaming Framework Documentation

Highlighted Details

  • Supports over 40 built-in vulnerabilities (e.g., Bias, PII Leakage, Misinformation) and 10+ adversarial attack methods (e.g., Prompt Injection, Jailbreaking).
  • Allows customization of vulnerabilities and attacks with minimal code.
  • Integrates with OWASP Top 10 for LLMs and NIST AI RMF guidelines.
  • Offers a CLI for running red teaming with YAML configurations and supports custom vulnerability definitions.

Maintenance & Community

  • Developed by the founders of Confident AI.
  • Discord server available for community interaction.
  • Roadmap includes expanding vulnerability and attack coverage.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permits commercial use and linking with closed-source applications.

Limitations & Caveats

The framework relies on LLMs for attack generation and evaluation, which may introduce its own biases or limitations. Specific LLM provider configurations and API key management are necessary for operation.

Health Check
Last commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
21
Issues (30d)
2
Star History
518 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

llm-security by greshake

0.2%
2k
Research paper on indirect prompt injection attacks targeting app-integrated LLMs
created 2 years ago
updated 2 weeks ago
Starred by Michael Truell Michael Truell(Cofounder of Cursor), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
14 more.

SWE-agent by SWE-agent

0.5%
17k
Agent for automated software engineering (NeurIPS 2024)
created 1 year ago
updated 2 days ago
Feedback? Help us improve.