Collection of jailbreak methods on LLMs
Top 44.1% on sourcepulse
This repository serves as a comprehensive, curated collection of state-of-the-art research on jailbreaking Large Language Models (LLMs). It targets researchers, security engineers, and practitioners interested in understanding and mitigating vulnerabilities in LLMs, providing a valuable resource for advancing LLM safety and security.
How It Works
The collection categorizes jailbreak methods into attack types (e.g., black-box, white-box, multi-turn, multimodal) and defense strategies (learning-based, strategy-based, guard models). It compiles relevant papers, code repositories, datasets, and evaluation methodologies, offering a structured overview of the evolving landscape of LLM security research.
Quick Start & Requirements
This repository is a curated list of research papers and code. There is no direct installation or execution command. Users are directed to individual paper links and associated code repositories for specific implementations.
Highlighted Details
Maintenance & Community
The repository is maintained by yueliu1999, with contributions welcomed via PRs and issues. Contact is available via email for specific inquiries. The project encourages citation of its featured papers.
Licensing & Compatibility
The repository itself is a collection of links and does not impose a specific license. Individual linked papers and code repositories will have their own respective licenses, which users must adhere to.
Limitations & Caveats
This is a curated list and does not provide a unified framework or tool for performing jailbreaks or defenses. Users must navigate to individual resources for implementation details and potential dependencies. The rapid pace of LLM research means the content may require frequent updates to remain fully comprehensive.
2 days ago
Inactive