cybergym  by sunblaze-ucb

Cybersecurity AI agent evaluation framework

Created 11 months ago
289 stars

Top 90.8% on SourcePulse

GitHubView on GitHub
Project Summary

CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess AI agents on real-world vulnerability analysis tasks. It offers a robust platform for testing AI agents' capabilities in discovering and analyzing software vulnerabilities, benefiting AI researchers and cybersecurity professionals aiming to advance automated vulnerability discovery.

How It Works

CyberGym evaluates AI agents against realistic cybersecurity challenges using a framework that incorporates compilation environments, potentially including dynamic compilation, to simulate vulnerability analysis scenarios. A key feature is its domain-allowlist proxy firewall, which restricts agent network access, ensuring a controlled and secure testing environment. This design enables scalable and rigorous assessment of AI agents' performance on complex vulnerability discovery tasks.

Quick Start & Requirements

  • Installation: Install dependencies via pip3 install -e '.[dev,server]'.
  • Prerequisites: Python and a Docker environment are required.
  • Data: Benchmark data downloads are substantial, ranging from ~130GB (binary-only) to ~10TB (full data), with a subset also available.
  • Links:
    • Benchmark Data: https://huggingface.co/datasets/sunblaze-ucb/cybergym
    • Paper: https://arxiv.org/abs/2506.02548

Highlighted Details

  • Large-scale framework for evaluating AI agents on real-world cybersecurity tasks.
  • Features a domain-allowlist proxy firewall for controlled agent network access.
  • Includes an example agent suite and a submission server for Proof-of-Concept (PoC) verification.
  • Supports both full dynamic compilation environments and a lighter binary-only mode.

Maintenance & Community

The README provides no specific details on active maintenance, notable contributors, or community channels such as Discord or Slack. The project's association with research is indicated by its arXiv citation.

Licensing & Compatibility

Licensed under the Apache License 2.0, this project is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The primary adoption barrier is the significant disk space requirement for benchmark data, potentially reaching 10TB for the full dataset. Setup also necessitates a Docker environment. The README lacks details on community support channels or active development signals beyond the research paper.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
82 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

cai by aliasrobotics

0.8%
8k
Cybersecurity AI (CAI) is an open framework for building AI-driven cybersecurity tools
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.