OpenClaw-PwnKit by imbue-bit

LLM agent RCE via adversarial tool-call hijacking

Created 4 months ago

287 stars

Top 91.2% on SourcePulse

Project Summary

This research framework demonstrates a critical security threat against LLM agents, enabling Remote Code Execution (RCE) through adversarial tool-call hijacking. It targets security researchers and engineers evaluating LLM agent vulnerabilities, providing a method to bypass safety alignments by exploiting continuous embedding space perturbations. The primary benefit is advancing the understanding of these sophisticated attacks to drive the development of more robust defenses.

How It Works

The project employs a black-box attack strategy using CMA-ES (Covariance Matrix Adaptation Evolution Strategy) to optimize adversarial triggers in the token embedding space. It leverages a surrogate model (Phi-2) to extract token embeddings, applies PCA for dimensionality reduction, and uses FAISS for mapping continuous vectors back to discrete tokens. This derivative-free optimization approach is advantageous as it does not require access to model gradients, making it effective against closed-source frontier LLMs.

Quick Start & Requirements

Installation: Clone the repository, create and activate a Python virtual environment, and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.10+, PyTorch, Transformers, FAISS (faiss-cpu), CMA, scikit-learn, FastAPI, OpenAI SDK, Rich, loguru. An OpenAI API key is required. The surrogate model (microsoft/phi-2, ~5 GB) is downloaded automatically.
Configuration: Edit config.yaml to set the C2 server URL and OpenAI API key.
Usage: Interactive CLI (python pwnkit_cli.py) or programmatic API.
Links: GitHub Repository

Highlighted Details

Demonstrates black-box adversarial attacks on LLM agent tool-calling via CMA-ES in token embedding space.
Achieves Remote Code Execution (RCE) by inducing adversarial tool-call hijacking.
Effective against closed-source frontier models (e.g., GPT-4, Claude 3) without gradient access.
Includes multiple attack vectors: CMA-ES trigger optimization, naive prompt injection, honeypot delivery, and skill poisoning.

Maintenance & Community

No specific details regarding maintainers, sponsorships, or community channels (e.g., Discord, Slack) are provided in the README. The project welcomes Pull Requests.

Licensing & Compatibility

Licensed under the GNU General Public License v3.0 (GPL v3). This is a strong copyleft license, requiring derivative works to also be licensed under GPL v3. Compatibility for commercial use or linking with closed-source software requires careful review of GPL v3 obligations.

Limitations & Caveats

This tool is released strictly for academic research and authorized security testing. A full optimization run can incur significant API costs (estimated $50–200 USD) and requires substantial compute resources (several hours wall time, ~8 GB GPU memory). The accompanying paper is listed as "Coming Soon." The toolkit specifically targets OpenClaw-based agents as a reference implementation.

OpenClaw-PwnKit by imbue-bit

Explore Similar Projects

ASB by agiresearch

llm-security by dropbox

Visual-Adversarial-Examples-Jailbreak-Large-Language-Models by Unispac

augustus by praetorian-inc

AI-penetration-testing by Mr-Infect

AI-ML-Free-Resources-for-Security-and-Prompt-Injection by anmolksachan

awesome-prompt-injection by Joe-B-Security

advmlthreatmatrix by mitre

rebuff by protectai

clearwing by Lazarus-AI

Spiritual-Spell-Red-Teaming by Goochbeater

llm-attacks by llm-attacks