CTFKnow  by tszdanger

Framework for evaluating LLMs on cybersecurity CTF challenges

Created 3 months ago
255 stars

Top 98.9% on SourcePulse

GitHubView on GitHub
Project Summary

CTFKnow is a research framework for measuring and enhancing Large Language Models' (LLMs) capabilities in solving cybersecurity Capture-the-Flag (CTF) challenges. It automates data collection, knowledge extraction, question generation, and model evaluation, providing a comprehensive pipeline for cybersecurity AI research and education.

How It Works

The framework operates through a multi-stage pipeline. It begins by scraping CTF write-ups from CTFtime.org, prioritizing high-quality content. Next, it employs LLMs to extract universal cybersecurity knowledge and practical exploitation examples from these write-ups. This extracted knowledge is then used to automatically generate multiple-choice and open-ended questions, tailored to the difficulty of the original challenges. Finally, it evaluates LLM performance on these generated questions using various metrics.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, an OpenAI API key (or equivalent for other providers).
  • Setup: Set your API key via environment variables (e.g., export OPENAI_API_KEY="your-openai-api-key").
  • Resources: Pre-collected data is available, but scraping may require time and network resources. LLM inference for extraction and evaluation will incur API costs.
  • Documentation: The README provides a detailed overview and usage examples.

Highlighted Details

  • Supports multiple LLM providers (OpenAI, Replicate, etc.).
  • Includes a dataset of 13,000+ CTF challenges across 6 categories (Web, Pwn, Reverse, Crypto, Forensics, Misc) from 2019-2024.
  • Enables customization of LLM prompts and data processing workflows.
  • Offers a full pipeline for research, including data collection, knowledge extraction, question generation, and model evaluation.

Maintenance & Community

The project is associated with a research paper and welcomes contributions via pull requests. Development setup instructions are provided, including pre-commit hooks.

Licensing & Compatibility

This project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The framework relies heavily on LLM APIs, which can incur costs and are subject to the performance and availability of those services. The quality of extracted knowledge and generated questions is dependent on the LLM used and the quality of the scraped write-ups.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

PurpleLlama by meta-llama

0.6%
4k
LLM security toolkit for assessing/improving generative AI models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.