UltraBr3aks by SlowLow999

AI jailbreak techniques for bypassing LLM guardrails

Created 9 months ago

281 stars

Top 92.6% on SourcePulse

Project Summary

This repository, UltraBr3aks, offers a curated collection of advanced "jailbreak" prompts designed to bypass safety guardrails in multiple large language model (LLM) vendors. It targets researchers and power users interested in probing LLM alignment vulnerabilities and exploring unrestricted generation capabilities, providing novel attack vectors to test model security.

How It Works

UltraBr3aks employs diverse prompting strategies to circumvent LLM safety mechanisms. Key techniques include "Attention-Breaking," which manipulates Transformer self-attention to disrupt guardrail focus by embedding harmful requests within formatting tasks or unformatted text. Other methods involve persona adoption with double encoding ("1Shot-Puppetry"), injecting custom tokens via model instructions ("!Special_Token"), or routing requests through external APIs and internal model features to bypass direct safety filters, exploiting specific architectural weaknesses or prompt interpretation flaws.

Quick Start & Requirements

The repository focuses on prompt engineering; no explicit installation or runtime commands are provided. Users apply described prompt techniques directly to target LLM interfaces or APIs. Requirements include access to specific LLM versions mentioned (e.g., GPT 5.1, Claude 4.5/4.6, Gemini 3/2.5 Pro, OpenAI OSS models).

Highlighted Details

Attention-Breaking: Targets Transformer self-attention to disrupt guardrail focus via specific token patterns and contextual noise.
1Shot-Puppetry: Universal attack using role-play, persona adoption, and double encoding (Leet + Base64) across major LLMs (Claude 4.5, GPT-5, Gemini 2.5 Pro).
API & Artifact Exploitation: Techniques like C0d33X3 and Cl4ud33X3 leverage external APIs (Pollinations) or internal model features (Artifacts) to bypass direct safety filters.
Policy & Input Routing: Methods like "Policy Injection" (N3w P0l!cy) weaponize a model's own safety guidelines, while "Smart Input Routing" (SIR) categorizes requests to activate specific personas.

Maintenance & Community

Maintained by SlowLow999. Community interaction via Discord: @ultrazartrex.

Licensing & Compatibility

No specific license is mentioned. Content is explicitly for "educational/research use only," implying potential restrictions on commercial application.

Limitations & Caveats

Content is strictly for educational/research purposes with an emphasis on responsible use. It targets specific LLM versions (OpenAI, Anthropic, Google) and may not apply universally or to future model updates, as effectiveness can change rapidly.

UltraBr3aks by SlowLow999

Explore Similar Projects

BadUSB-GPT by ooovenenoso

llm-security by dropbox

prompt-hacker-collections by yunwei37

OpenClaw-PwnKit by imbue-bit

prompt-injection-defenses by tldrsec

Open-Prompt-Injection by liu00222

EasyJailbreak by EasyJailbreak

ps-fuzz by prompt-security

rebuff by protectai

FuzzyAI by cyberark

llm-guard by protectai

PurpleLlama by meta-llama