agent-threat-rules  by Agent-Threat-Rule

Open standard for AI agent threat detection

Created 3 months ago
281 stars

Top 92.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

ATR (Agent Threat Rules) addresses the emerging security threat landscape of AI agents, vulnerable to attacks like prompt injection and tool poisoning. It provides an open, vendor-neutral, machine-readable rule format, analogous to Sigma for SIEMs, enabling standardized detection of AI agent threats. This benefits engineers and security professionals by offering a peer-reviewable standard to identify and mitigate risks in AI agent infrastructure.

How It Works

ATR rules are YAML documents conforming to a versioned schema (ATR-YYYY-NNNNN), specifying attack patterns, target input fields (LLM I/O, tool arguments), testing, and mappings to taxonomies like OWASP Agentic and MITRE ATLAS. The project includes a reference TypeScript engine and a Python wrapper (pyatr), designed with a narrow schema for broad engine compatibility across languages (Go, Rust). This offers a standardized, unambiguous method for AI agent threat detection.

Quick Start & Requirements

Highlighted Details

  • Production Adoption: Integrated into Microsoft AGT, Cisco AI Defense, MISP, and Gen Digital Sage.
  • Broad Coverage: Maps to 10/10 OWASP Agentic Top 10 categories and achieves 91.8% SAFE-MCP coverage.
  • High Efficacy: Demonstrates strong performance, including 98.0% recall on NVIDIA garak "in-the-wild" jailbreaks and 99.1% recall on Anthropic's hh-rlhf dataset.
  • Standardization Efforts: Developing proposal-stage scaffolding for potential OASIS Open Project submission.

Maintenance & Community

The project is transitioning from a single-maintainer (BDFL, Adam Lin) to a Technical Steering Committee (TSC). Maintenance is supported by community sponsorship via Open Collective. Contributions are welcomed via a streamlined issue-to-proposal process.

Licensing & Compatibility

Licensed under the permissive MIT License, allowing commercial use and integration into closed-source projects without copyleft restrictions.

Limitations & Caveats

The ATR rule format is a Working Draft (v3.0.0-alpha.1), and standardization is proposed. The regex-based detection misses paraphrased attacks and novel patterns, evidenced by zero recall on academic paraphrase corpora. ATR is a content layer; pair with sandboxing and human oversight for high-risk actions. The project currently has a single primary maintainer, with plans to increase this to mitigate bus factor.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
145
Issues (30d)
4
Star History
55 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

codegate by stacklok

0%
788
AI agent security and management tool
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.