ASB by agiresearch

LLM agent security benchmarking framework

Created 1 year ago

272 stars

Top 94.6% on SourcePulse

Project Summary

Agent Security Bench (ASB) formalizes and benchmarks adversarial attacks and defenses for Large Language Model (LLM)-based agents. It provides a systematic evaluation framework across 10 diverse scenarios, enabling researchers and engineers to assess the security posture of LLM agents against various threats. The primary benefit is a standardized methodology for understanding and mitigating security vulnerabilities in agentic AI systems.

How It Works

ASB is built upon the AIOS framework and systematically evaluates a spectrum of adversarial attacks, including Direct Prompt Injection (DPI), Observation Prompt Injection (OPI), Plan-of-Thought (PoT) Backdoor, and Memory Poisoning. It benchmarks these attacks against numerous LLM backbones and evaluates the efficacy of corresponding defense mechanisms like Delimiters, Sandwich Prevention, and Paraphrasing. This approach offers a comprehensive dataset for understanding attack vectors and defense performance.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (conda create -n ASB python=3.11, source activate ASB), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.11, Ollama (for running open-source LLMs locally; supports CPU-only). CUDA is optional.
Running: Execute attacks via python scripts/agent_attack.py --cfg_path config/<ATTACK_TYPE>.yml or python scripts/agent_attack_pot.py. Configuration is managed via YAML files in the config/ directory.
Links:
- Website: https://luckfort.github.io/ASBench/
- Paper: https://openreview.net/forum?id=V4y0CpX4hK (ICLR 2025)

Highlighted Details

Evaluates four primary attack types: DPI, OPI, PoT Backdoor, and Memory Poisoning.
Tests against 13 LLM backbones, including Gemma, LLaMA, Mixtral, Qwen, Claude, and GPT series.
Benchmarks multiple defense strategies against specific attack types.
Provides extensive experimental results detailing attack success rates (ASR) and success rates (RR) across LLMs and defenses.

Maintenance & Community

This project serves as the official code release for the ICLR 2025 paper "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents." Key contributors include Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Further details are available on the project website.

Licensing & Compatibility

The provided README content does not specify a software license. This omission requires clarification regarding usage rights, distribution, and commercial compatibility.

Limitations & Caveats

Defenses against Memory Poisoning attacks are reported as largely ineffective, with high average False Negative Rates (FNR) and False Positive Rates (FPR). While some defenses show promise against PoT Backdoor attacks, overall effectiveness varies significantly across LLMs and defense types. OPI defenses also exhibit mixed results, with certain LLMs remaining vulnerable.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days