Discover and explore top open-source AI tools and projects—updated daily.
google-researchResearch code for defeating prompt injections in language models
Top 96.7% on SourcePulse
This repository provides the code for the research paper "Defeating Prompt Injections by Design." It offers a defense mechanism against prompt injection attacks in large language models, targeting researchers and developers working with LLMs who need to secure their applications against adversarial inputs. The primary benefit is enabling the reproduction of experimental results and providing a foundation for understanding and implementing robust prompt injection defenses.
How It Works
The project implements a defense strategy designed to mitigate prompt injection vulnerabilities. While specific algorithmic details are not elaborated in the README, the code likely orchestrates model interactions and input processing to identify and neutralize malicious prompts before they can compromise the LLM's intended behavior. The approach aims to defeat prompt injections "by design," suggesting a proactive rather than reactive security posture.
Quick Start & Requirements
uv from official instructions. Rename .env.example to .env and add API keys. Dependencies are installed via uv run ....uv package manager, API keys for language models, Python environment.uv run --env-file .env main.py MODEL_NAME [--use-original] [--ad_defense] [--reasoning-effort] [--thinking_budget_tokens] [--run-attack] [--replay-with-policies] [--eval_mode]. Detailed arguments are available via uv run main.py --help.models.py by updating _supported_model_names and _oai_thinking_models variables.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
8 months ago
Inactive
protectai
protectai
meta-llama