OpenRT by AI45Lab

Open-source framework for multimodal LLM red teaming

Created 6 months ago

256 stars

Top 98.5% on SourcePulse

Project Summary

Summary

OpenRT is an open-source red teaming framework designed to systematically test the safety and robustness of Multimodal Large Language Models (MLLMs). It provides researchers and engineers with a comprehensive suite of over 40 attack methods, enabling the discovery of vulnerabilities and the enhancement of LLM defenses against adversarial inputs.

How It Works

The framework employs a modular, plugin-based architecture, allowing for flexible composition of components. It supports a wide array of attack strategies, encompassing both black-box and white-box methodologies, and extends capabilities to multimodal inputs, including text and images. Experiments are driven by YAML configuration files, streamlining the definition and execution of red teaming scenarios, while integrated evaluation mechanisms, including keyword matching and LLM judges, provide quantitative and qualitative assessments.

Quick Start & Requirements

Installation can be performed via PyPI (pip install openrt) or from source (git clone https://github.com/AI45Lab/OpenRT.git, cd OpenRT, pip install -e .). Users must configure API keys for LLM interactions, typically via environment variables (export OPENAI_API_KEY="..."). The framework supports running individual attack examples or full experiments using YAML configurations. Batch evaluations can be executed via the eval.py script with extensive command-line arguments for customization. Official documentation and demos are available via the project page 🌐 Project Page.

Highlighted Details

Features 42+ distinct attack methods, categorized into black-box (optimization, fuzzing, LLM-driven refinement, linguistic, contextual deception, multimodal, multi-agent) and white-box approaches.
Native support for multimodal LLMs, enabling attacks via text and image vectors.
A highly extensible, plugin-based architecture facilitates the addition of new attack methods and components.
Configuration-driven experiment setup and advanced batch evaluation capabilities via YAML and CLI.

Maintenance & Community

The README does not detail specific maintenance schedules, notable contributors, sponsorships, or community channels such as Discord or Slack. Primary community engagement points are the project page and the arXiv preprint arXiv.

Licensing & Compatibility

OpenRT is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) License. This is a strong copyleft license that requires any modifications or derivative works to be made available under the same license, which may impose restrictions on integration into proprietary software or commercial closed-source products.

Limitations & Caveats

The provided README does not explicitly mention any limitations, known bugs, alpha status, or unsupported platforms. The framework is presented as a robust, released tool.

OpenRT by AI45Lab

Explore Similar Projects

promptbeat by tophant-ai

Awesome-Multimodal-Jailbreak by liuxuannan

bingo by bingook

dspy-redteam by haizelabs

Visual-Adversarial-Examples-Jailbreak-Large-Language-Models by Unispac

augustus by praetorian-inc

AI-penetration-testing by Mr-Infect

HarmBench by centerforaisafety

agentic_security by msoedov

deepteam by confident-ai

garak by NVIDIA

deepeval by confident-ai