Discover and explore top open-source AI tools and projects—updated daily.
sunblaze-ucbCybersecurity AI agent evaluation framework
Top 90.8% on SourcePulse
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess AI agents on real-world vulnerability analysis tasks. It offers a robust platform for testing AI agents' capabilities in discovering and analyzing software vulnerabilities, benefiting AI researchers and cybersecurity professionals aiming to advance automated vulnerability discovery.
How It Works
CyberGym evaluates AI agents against realistic cybersecurity challenges using a framework that incorporates compilation environments, potentially including dynamic compilation, to simulate vulnerability analysis scenarios. A key feature is its domain-allowlist proxy firewall, which restricts agent network access, ensuring a controlled and secure testing environment. This design enables scalable and rigorous assessment of AI agents' performance on complex vulnerability discovery tasks.
Quick Start & Requirements
pip3 install -e '.[dev,server]'.https://huggingface.co/datasets/sunblaze-ucb/cybergymhttps://arxiv.org/abs/2506.02548Highlighted Details
Maintenance & Community
The README provides no specific details on active maintenance, notable contributors, or community channels such as Discord or Slack. The project's association with research is indicated by its arXiv citation.
Licensing & Compatibility
Licensed under the Apache License 2.0, this project is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The primary adoption barrier is the significant disk space requirement for benchmark data, potentially reaching 10TB for the full dataset. Setup also necessitates a Docker environment. The README lacks details on community support channels or active development signals beyond the research paper.
3 weeks ago
Inactive
aliasrobotics