Discover and explore top open-source AI tools and projects—updated daily.
haizelabsRed-teaming language models using automated prompting
Top 100.0% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project addresses the challenge of systematically red-teaming language models by leveraging DSPy, a framework for structuring and optimizing LM programs. It targets researchers and developers seeking to evaluate LLM safety and robustness, offering a novel approach that significantly enhances attack success rates with minimal manual prompt engineering.
How It Works
<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The core approach utilizes DSPy to compile a deep language program composed of alternating "Attack" and "Refine" modules. This program is optimized using DSPy's MIPRO optimizer, guided by an LLM acting as a judge. This method automates prompt generation and program structure, enabling the creation of effective red-teaming agents without extensive manual prompt engineering.
Quick Start & Requirements (only include this section if it contains useful information)
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
<1-3 sentences on caveats: unsupported platforms, missing features, alpha status, known bugs, breaking changes, bus factor, deprecation, etc. Avoid vague non-statements and judgments.> The project acknowledges that its 44% ASR is not state-of-the-art. The results were achieved with minimal effort in prompt design and hyperparameter tuning, suggesting potential for further optimization but also indicating that significant manual effort might be required to reach SOTA performance. The lack of explicit setup instructions and license information presents adoption hurdles.
11 months ago
Inactive
microsoft