cybergym by sunblaze-ucb

Cybersecurity AI agent evaluation framework

Created 1 year ago

442 stars

Top 67.1% on SourcePulse

Project Summary

CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess AI agents on real-world vulnerability analysis tasks. It offers a robust platform for testing AI agents' capabilities in discovering and analyzing software vulnerabilities, benefiting AI researchers and cybersecurity professionals aiming to advance automated vulnerability discovery.

How It Works

CyberGym evaluates AI agents against realistic cybersecurity challenges using a framework that incorporates compilation environments, potentially including dynamic compilation, to simulate vulnerability analysis scenarios. A key feature is its domain-allowlist proxy firewall, which restricts agent network access, ensuring a controlled and secure testing environment. This design enables scalable and rigorous assessment of AI agents' performance on complex vulnerability discovery tasks.

Quick Start & Requirements

Installation: Install dependencies via pip3 install -e '.[dev,server]'.
Prerequisites: Python and a Docker environment are required.
Data: Benchmark data downloads are substantial, ranging from ~130GB (binary-only) to ~10TB (full data), with a subset also available.
Links:
- Benchmark Data: https://huggingface.co/datasets/sunblaze-ucb/cybergym
- Paper: https://arxiv.org/abs/2506.02548

Highlighted Details

Large-scale framework for evaluating AI agents on real-world cybersecurity tasks.
Features a domain-allowlist proxy firewall for controlled agent network access.
Includes an example agent suite and a submission server for Proof-of-Concept (PoC) verification.
Supports both full dynamic compilation environments and a lighter binary-only mode.

Maintenance & Community

The README provides no specific details on active maintenance, notable contributors, or community channels such as Discord or Slack. The project's association with research is indicated by its arXiv citation.

Licensing & Compatibility

Licensed under the Apache License 2.0, this project is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The primary adoption barrier is the significant disk space requirement for benchmark data, potentially reaching 10TB for the full dataset. Setup also necessitates a Docker environment. The README lacks details on community support channels or active development signals beyond the research paper.

cybergym by sunblaze-ucb

Explore Similar Projects

Awesome-AI-for-cybersecurity by Billy1900

ai-web3-security by pashov

Dark-Moon by ASCIT31

Awesome-AI-Hacking-Agents by EvanThomasLuke

Tsec-Hackathon by Yeti-791

awesome-cybersecurity-agentic-ai by raphabot

hackerai by hackerai-tech

defenseclaw by cisco-ai-defense

agentshield by affaan-m

pentest-ai-agents by 0xSteph

cai by aliasrobotics

Anthropic-Cybersecurity-Skills by mukul975