Chatbot exploit list for adversarial prompt engineering
Top 79.2% on sourcepulse
This repository serves as a curated collection of prompts and techniques designed to test and exploit vulnerabilities in large language models (LLMs) and chatbots. It targets AI researchers, security professionals, and developers seeking to understand and mitigate potential risks associated with chatbot interactions. The primary benefit is providing a practical resource for identifying and addressing prompt injection and social engineering vulnerabilities.
How It Works
The repository categorizes various attack vectors, including command injection keywords, emoji obfuscation, character encoding (ASCII, Hex, Base64, Unicode, etc.), zero-width characters, and social engineering tactics. These methods aim to bypass chatbot safety filters, elicit unintended responses, or manipulate the LLM's behavior by exploiting how it processes and interprets diverse input formats. The advantage lies in its comprehensive catalog of obfuscation techniques, offering a structured approach to discovering LLM weaknesses.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is a work in progress, with an invitation for community contributions via issues and pull requests. There are no specific mentions of maintainers, sponsorships, or a formal community channel like Discord or Slack.
Licensing & Compatibility
The repository does not explicitly state a license. This lack of a specified license may imply all rights are reserved or that it is not intended for redistribution or modification without explicit permission. Users should exercise caution regarding commercial use or integration into closed-source projects.
Limitations & Caveats
The effectiveness of these exploits can vary significantly between different LLM models and their specific versions or updates. The repository is a collection of examples, and successful execution often requires experimentation and adaptation. The lack of a formal license could pose legal or compatibility issues for certain use cases.
2 years ago
1 day