camel-prompt-injection by google-research

Research code for defeating prompt injections in language models

Created 1 year ago

353 stars

Top 78.7% on SourcePulse

Project Summary

This repository provides the code for the research paper "Defeating Prompt Injections by Design." It offers a defense mechanism against prompt injection attacks in large language models, targeting researchers and developers working with LLMs who need to secure their applications against adversarial inputs. The primary benefit is enabling the reproduction of experimental results and providing a foundation for understanding and implementing robust prompt injection defenses.

How It Works

The project implements a defense strategy designed to mitigate prompt injection vulnerabilities. While specific algorithmic details are not elaborated in the README, the code likely orchestrates model interactions and input processing to identify and neutralize malicious prompts before they can compromise the LLM's intended behavior. The approach aims to defeat prompt injections "by design," suggesting a proactive rather than reactive security posture.

Quick Start & Requirements

Installation: Install uv from official instructions. Rename .env.example to .env and add API keys. Dependencies are installed via uv run ....
Prerequisites: uv package manager, API keys for language models, Python environment.
Running the defense: uv run --env-file .env main.py MODEL_NAME [--use-original] [--ad_defense] [--reasoning-effort] [--thinking_budget_tokens] [--run-attack] [--replay-with-policies] [--eval_mode]. Detailed arguments are available via uv run main.py --help.
Adding Models: New models can be added to models.py by updating _supported_model_names and _oai_thinking_models variables.
Links: No direct links to demos or official docs are provided, but CLI help is available.

Highlighted Details

Provides code to reproduce experimental results from the paper "Defeating Prompt Injections by Design."
Focuses on a "by design" approach to counter prompt injection vulnerabilities.

Maintenance & Community

This is a research artifact, not a supported product.
No maintenance or support is planned.
Bug fixes are not anticipated.
For questions, users are directed to open an issue in the repository.

Licensing & Compatibility

No explicit license is mentioned in the README.
The project is explicitly stated as "not a Google product" and "not an officially supported Google product."
It is not eligible for the Google Open Source Software Vulnerability Rewards Program. Compatibility for commercial use or closed-source linking is undetermined due to the lack of a license.

Limitations & Caveats

The interpreter implementation may contain bugs and is not guaranteed to be fully secure.
This codebase is intended solely for research artifact reproduction and is not production-ready.
No support or maintenance will be provided.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

11 stars in the last 30 days

Explore Similar Projects

Starred by

Andrey Vasnetsov

Andrey Vasnetsov(Cofounder of Qdrant) and

Andre Zayarni

Andre Zayarni(Cofounder of Qdrant).

aegis by automorphic-ai

LLM firewall for adversarial attack protection

Created 3 years ago

Updated 2 years ago

llm-security by dropbox

LLM security research code for prompt injection

Created 2 years ago

Updated 2 years ago

prompt-hacker-collections by yunwei37

LLM prompt hacking and defense resource

Created 3 years ago

Updated 1 year ago

HouYi by LLMSecurity

Automated prompt injection for LLM applications

Created 2 years ago

Updated 1 year ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

PromptInject by agencyenterprise

Framework for LLM adversarial prompt attack analysis

Created 3 years ago

Updated 2 months ago

prompt-injection-defenses by tldrsec

Collection of prompt injection defenses

Created 2 years ago

Updated 1 year ago

awesome-prompt-injection by Joe-B-Security

Resource list for prompt injection attacks on ML models

Created 3 years ago

Updated 1 month ago

arc_pi_taxonomy by Arcanum-Sec

A structured taxonomy for prompt injection attacks

Created 1 year ago

Updated 1 week ago

promptmap by utkusen

Prompt injection scanner for LLM apps

Created 3 years ago

Updated 7 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Michele Castata

Michele Castata(President of Replit), and

3 more.

rebuff by protectai

SDK for LLM prompt injection detection

Created 3 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind), and

3 more.

llm-guard by protectai

Security toolkit for LLM interactions

Created 3 years ago

Updated 2 days ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

5 more.

PurpleLlama by meta-llama

LLM security toolkit for assessing/improving generative AI models

Created 2 years ago

Updated 1 week ago

Feedback? Help us improve.