ai-goat by dhammon

AI security CTF for learning LLM vulnerabilities

Created 2 years ago

305 stars

Top 87.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Dan Guido

Cofounder of Trail of Bits

Project Summary

AI Goat provides a local, hands-on learning environment for AI security through vulnerable Large Language Model (LLM) Capture The Flag (CTF) challenges. It targets security professionals and enthusiasts seeking practical experience with emerging LLM threats like prompt injection and insecure output handling, offering a cost-free, self-contained alternative to cloud-based training.

How It Works

AI Goat leverages the Vicuna LLM, a derivative of Meta's LLaMA, which is downloaded locally. Challenges are constructed by concatenating instructions, user questions, and response directives into a prompt for the LLM. A pre-built Docker image, ai-base, contains necessary libraries, and docker-compose orchestrates individual challenges, attaching the LLM binary and exposing specific ports. An optional ai-ctfd container provides a web interface for challenge tracking and flag submission.

Quick Start & Requirements

Install: git clone https://github.com/dhammon/ai-goat, cd ai-goat, pip3 install -r requirements.txt, chmod +x ai-goat.py, ./ai-goat.py --install
Prerequisites: Git, Python 3, pip3, Docker, Docker Compose, user in docker group.
Resources: ~8GB disk space, minimum 16GB system RAM (8GB dedicated to LLM).
Docs: AI Goat GitHub

Highlighted Details

Focuses on OWASP Top 10 LLM Application Security risks.
Utilizes Vicuna LLM (derived from LLaMA) via llama-cpp-python.
Offers an optional CTFd instance for challenge management and submission.
Challenges are run via ./ai-goat.py --run <CHALLENGE_NUMBER>.

Maintenance & Community

Project maintained by rootcauz.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

License not explicitly stated in the README.
Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

LLM responses can take up to 30 seconds. The README notes that LLMs may occasionally "make up" flag values, requiring verification against the CTFd instance. Flag values may need manual synchronization between challenge source code and the CTFd admin panel.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days