Awesome-Multimodal-Jailbreak  by liuxuannan

Multimodal generative model jailbreaking: a survey

Created 1 year ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository serves as a comprehensive survey of jailbreak attacks and defense mechanisms targeting multimodal generative models. It addresses the critical need for understanding and mitigating vulnerabilities in AI systems that process diverse data types like text, images, and audio. Aimed at researchers, engineers, and security professionals, it offers a structured overview of the evolving landscape, enabling rapid assessment of current threats and solutions in multimodal AI security.

How It Works

The project systematically categorizes multimodal jailbreak vulnerabilities and defenses across four distinct lifecycle levels: input, encoder, generator, and output. It provides a detailed taxonomy of attack methods and defense strategies, covering various input-output modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any. This structured approach allows for a granular understanding of how attacks are formulated and how defenses can be implemented at different stages of the generative process.

Quick Start & Requirements

This repository functions as a curated collection of research papers and resources rather than a deployable software. Navigation is facilitated through the detailed table of contents, guiding users to specific sections on models, attacks, defenses, and evaluation. No specific installation or computational requirements are listed, as it is an informational resource.

Highlighted Details

  • Comprehensive taxonomy detailing the four levels of multimodal jailbreak (Input, Encoder, Generator, Output).
  • Extensive tables categorizing multimodal generative models (e.g., LLaVA, Stable Diffusion, GPT-4o) by modality and architecture.
  • A vast compilation of research papers on jailbreak attacks and defenses, with links to venues, dates, and code repositories where available.
  • Detailed sections on evaluation datasets and methodologies used in the field.

Maintenance & Community

The repository is described as "constantly updated" to ensure the inclusion of the most current information. Specific community channels or active maintenance team details are not provided.

Licensing & Compatibility

No specific open-source license or compatibility information is mentioned within the provided text.

Limitations & Caveats

As a survey, the repository is a snapshot of research and may not encompass all emerging threats or defenses. It focuses on academic and research resources, not on providing a ready-to-use security tool or framework.

Health Check
Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

llm-attacks by llm-attacks

0.3%
4k
Attack framework for aligned LLMs, based on a research paper
Created 2 years ago
Updated 1 year ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.3%
4k
LLM security toolkit for assessing/improving generative AI models
Created 1 year ago
Updated 23 hours ago
Feedback? Help us improve.