UltraBr3aks  by SlowLow999

AI jailbreak techniques for bypassing LLM guardrails

Created 9 months ago
281 stars

Top 92.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository, UltraBr3aks, offers a curated collection of advanced "jailbreak" prompts designed to bypass safety guardrails in multiple large language model (LLM) vendors. It targets researchers and power users interested in probing LLM alignment vulnerabilities and exploring unrestricted generation capabilities, providing novel attack vectors to test model security.

How It Works

UltraBr3aks employs diverse prompting strategies to circumvent LLM safety mechanisms. Key techniques include "Attention-Breaking," which manipulates Transformer self-attention to disrupt guardrail focus by embedding harmful requests within formatting tasks or unformatted text. Other methods involve persona adoption with double encoding ("1Shot-Puppetry"), injecting custom tokens via model instructions ("!Special_Token"), or routing requests through external APIs and internal model features to bypass direct safety filters, exploiting specific architectural weaknesses or prompt interpretation flaws.

Quick Start & Requirements

The repository focuses on prompt engineering; no explicit installation or runtime commands are provided. Users apply described prompt techniques directly to target LLM interfaces or APIs. Requirements include access to specific LLM versions mentioned (e.g., GPT 5.1, Claude 4.5/4.6, Gemini 3/2.5 Pro, OpenAI OSS models).

Highlighted Details

  • Attention-Breaking: Targets Transformer self-attention to disrupt guardrail focus via specific token patterns and contextual noise.
  • 1Shot-Puppetry: Universal attack using role-play, persona adoption, and double encoding (Leet + Base64) across major LLMs (Claude 4.5, GPT-5, Gemini 2.5 Pro).
  • API & Artifact Exploitation: Techniques like C0d33X3 and Cl4ud33X3 leverage external APIs (Pollinations) or internal model features (Artifacts) to bypass direct safety filters.
  • Policy & Input Routing: Methods like "Policy Injection" (N3w P0l!cy) weaponize a model's own safety guidelines, while "Smart Input Routing" (SIR) categorizes requests to activate specific personas.

Maintenance & Community

Maintained by SlowLow999. Community interaction via Discord: @ultrazartrex.

Licensing & Compatibility

No specific license is mentioned. Content is explicitly for "educational/research use only," implying potential restrictions on commercial application.

Limitations & Caveats

Content is strictly for educational/research purposes with an emphasis on responsible use. It targets specific LLM versions (OpenAI, Anthropic, Google) and may not apply universally or to future model updates, as effectiveness can change rapidly.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michele Castata Michele Castata(President of Replit), and
3 more.

rebuff by protectai

0.3%
1k
SDK for LLM prompt injection detection
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
3 more.

llm-guard by protectai

0.8%
3k
Security toolkit for LLM interactions
Created 2 years ago
Updated 4 months ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.4%
4k
LLM security toolkit for assessing/improving generative AI models
Created 2 years ago
Updated 4 days ago
Feedback? Help us improve.