FuzzyAI  by cyberark

LLM fuzzer for automated jailbreak detection

created 8 months ago
664 stars

Top 51.6% on sourcepulse

GitHubView on GitHub
Project Summary

FuzzyAI is an automated LLM fuzzing tool designed for developers and security researchers to identify and mitigate jailbreaks and vulnerabilities in LLM APIs. It supports a wide range of LLM providers and offers various attack techniques for comprehensive security testing.

How It Works

FuzzyAI employs a multi-pronged approach to LLM fuzzing, incorporating mutation-based, generation-based, and intelligent fuzzing techniques. It supports custom input generation and offers an extensible architecture for tailored testing. The tool can target various LLM providers, including OpenAI, Anthropic, Gemini, and Ollama, as well as custom REST APIs.

Quick Start & Requirements

  • Install dependencies using Poetry: poetry install and poetry shell.
  • Optional: Install Ollama and pull a model (e.g., ollama pull llama3.1).
  • Run the fuzzer via python run.py.
  • A Web UI is available via streamlit run webui.py.
  • Detailed documentation is available on the Wiki.

Highlighted Details

  • Supports numerous attack types, including ArtPrompt, Taxonomy-based paraphrasing, PAIR, Many-shot jailbreaking, ASCII Smuggling, Genetic, Hallucinations, DAN, WordGame, Crescendo, ActorAttack, Best-of-n, and Shuffle Inconsistency Attack (SI-Attack).
  • Integrates with major LLM providers like OpenAI, Anthropic, Gemini, Azure, AWS Bedrock, AI21, DeepSeek, and Ollama.
  • Allows attacking custom REST APIs by parsing raw HTTP requests.
  • Includes experimental Web UI and Jupyter notebooks for interactive analysis.

Maintenance & Community

The project is maintained by CyberArk. Contributions are welcome via CONTRIBUTING.md. Contact is available at fzai@cyberark.com.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Some classifiers are not compatible with all attack methods due to design differences (e.g., single-output vs. multi-output classifiers). When using Ollama, ensure all Ollama models are added before other model types. The port for Ollama defaults to 11434 and can be specified with -e port=.... The SI-Attack implementation is text-based only.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
2
Star History
132 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.