camel-prompt-injection  by google-research

Research code for defeating prompt injections in language models

Created 9 months ago
264 stars

Top 96.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the code for the research paper "Defeating Prompt Injections by Design." It offers a defense mechanism against prompt injection attacks in large language models, targeting researchers and developers working with LLMs who need to secure their applications against adversarial inputs. The primary benefit is enabling the reproduction of experimental results and providing a foundation for understanding and implementing robust prompt injection defenses.

How It Works

The project implements a defense strategy designed to mitigate prompt injection vulnerabilities. While specific algorithmic details are not elaborated in the README, the code likely orchestrates model interactions and input processing to identify and neutralize malicious prompts before they can compromise the LLM's intended behavior. The approach aims to defeat prompt injections "by design," suggesting a proactive rather than reactive security posture.

Quick Start & Requirements

  • Installation: Install uv from official instructions. Rename .env.example to .env and add API keys. Dependencies are installed via uv run ....
  • Prerequisites: uv package manager, API keys for language models, Python environment.
  • Running the defense: uv run --env-file .env main.py MODEL_NAME [--use-original] [--ad_defense] [--reasoning-effort] [--thinking_budget_tokens] [--run-attack] [--replay-with-policies] [--eval_mode]. Detailed arguments are available via uv run main.py --help.
  • Adding Models: New models can be added to models.py by updating _supported_model_names and _oai_thinking_models variables.
  • Links: No direct links to demos or official docs are provided, but CLI help is available.

Highlighted Details

  • Provides code to reproduce experimental results from the paper "Defeating Prompt Injections by Design."
  • Focuses on a "by design" approach to counter prompt injection vulnerabilities.

Maintenance & Community

  • This is a research artifact, not a supported product.
  • No maintenance or support is planned.
  • Bug fixes are not anticipated.
  • For questions, users are directed to open an issue in the repository.

Licensing & Compatibility

  • No explicit license is mentioned in the README.
  • The project is explicitly stated as "not a Google product" and "not an officially supported Google product."
  • It is not eligible for the Google Open Source Software Vulnerability Rewards Program. Compatibility for commercial use or closed-source linking is undetermined due to the lack of a license.

Limitations & Caveats

  • The interpreter implementation may contain bugs and is not guaranteed to be fully secure.
  • This codebase is intended solely for research artifact reproduction and is not production-ready.
  • No support or maintenance will be provided.
Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
45 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michele Castata Michele Castata(President of Replit), and
3 more.

rebuff by protectai

0.3%
1k
SDK for LLM prompt injection detection
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
3 more.

llm-guard by protectai

1.0%
3k
Security toolkit for LLM interactions
Created 2 years ago
Updated 2 months ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.3%
4k
LLM security toolkit for assessing/improving generative AI models
Created 2 years ago
Updated 6 days ago
Feedback? Help us improve.