Discover and explore top open-source AI tools and projects—updated daily.
agiresearchLLM agent security benchmarking framework
Top 99.5% on SourcePulse
Agent Security Bench (ASB) formalizes and benchmarks adversarial attacks and defenses for Large Language Model (LLM)-based agents. It provides a systematic evaluation framework across 10 diverse scenarios, enabling researchers and engineers to assess the security posture of LLM agents against various threats. The primary benefit is a standardized methodology for understanding and mitigating security vulnerabilities in agentic AI systems.
How It Works
ASB is built upon the AIOS framework and systematically evaluates a spectrum of adversarial attacks, including Direct Prompt Injection (DPI), Observation Prompt Injection (OPI), Plan-of-Thought (PoT) Backdoor, and Memory Poisoning. It benchmarks these attacks against numerous LLM backbones and evaluates the efficacy of corresponding defense mechanisms like Delimiters, Sandwich Prevention, and Paraphrasing. This approach offers a comprehensive dataset for understanding attack vectors and defense performance.
Quick Start & Requirements
conda create -n ASB python=3.11, source activate ASB), and install dependencies (pip install -r requirements.txt).python scripts/agent_attack.py --cfg_path config/<ATTACK_TYPE>.yml or python scripts/agent_attack_pot.py. Configuration is managed via YAML files in the config/ directory.Highlighted Details
Maintenance & Community
This project serves as the official code release for the ICLR 2025 paper "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents." Key contributors include Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Further details are available on the project website.
Licensing & Compatibility
The provided README content does not specify a software license. This omission requires clarification regarding usage rights, distribution, and commercial compatibility.
Limitations & Caveats
Defenses against Memory Poisoning attacks are reported as largely ineffective, with high average False Negative Rates (FNR) and False Positive Rates (FPR). While some defenses show promise against PoT Backdoor attacks, overall effectiveness varies significantly across LLMs and defense types. OPI defenses also exhibit mixed results, with certain LLMs remaining vulnerable.
1 month ago
Inactive
protectai