Discover and explore top open-source AI tools and projects—updated daily.
imbue-bitLLM agent RCE via adversarial tool-call hijacking
Top 92.6% on SourcePulse
This research framework demonstrates a critical security threat against LLM agents, enabling Remote Code Execution (RCE) through adversarial tool-call hijacking. It targets security researchers and engineers evaluating LLM agent vulnerabilities, providing a method to bypass safety alignments by exploiting continuous embedding space perturbations. The primary benefit is advancing the understanding of these sophisticated attacks to drive the development of more robust defenses.
How It Works
The project employs a black-box attack strategy using CMA-ES (Covariance Matrix Adaptation Evolution Strategy) to optimize adversarial triggers in the token embedding space. It leverages a surrogate model (Phi-2) to extract token embeddings, applies PCA for dimensionality reduction, and uses FAISS for mapping continuous vectors back to discrete tokens. This derivative-free optimization approach is advantageous as it does not require access to model gradients, making it effective against closed-source frontier LLMs.
Quick Start & Requirements
pip install -r requirements.txt.config.yaml to set the C2 server URL and OpenAI API key.python pwnkit_cli.py) or programmatic API.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, sponsorships, or community channels (e.g., Discord, Slack) are provided in the README. The project welcomes Pull Requests.
Licensing & Compatibility
Licensed under the GNU General Public License v3.0 (GPL v3). This is a strong copyleft license, requiring derivative works to also be licensed under GPL v3. Compatibility for commercial use or linking with closed-source software requires careful review of GPL v3 obligations.
Limitations & Caveats
This tool is released strictly for academic research and authorized security testing. A full optimization run can incur significant API costs (estimated $50–200 USD) and requires substantial compute resources (several hours wall time, ~8 GB GPU memory). The accompanying paper is listed as "Coming Soon." The toolkit specifically targets OpenClaw-based agents as a reference implementation.
1 month ago
Inactive
protectai
llm-attacks
meta-llama