CTFKnow by tszdanger

Framework for evaluating LLMs on cybersecurity CTF challenges

Created 6 months ago

290 stars

Top 91.0% on SourcePulse

Project Summary

CTFKnow is a research framework for measuring and enhancing Large Language Models' (LLMs) capabilities in solving cybersecurity Capture-the-Flag (CTF) challenges. It automates data collection, knowledge extraction, question generation, and model evaluation, providing a comprehensive pipeline for cybersecurity AI research and education.

How It Works

The framework operates through a multi-stage pipeline. It begins by scraping CTF write-ups from CTFtime.org, prioritizing high-quality content. Next, it employs LLMs to extract universal cybersecurity knowledge and practical exploitation examples from these write-ups. This extracted knowledge is then used to automatically generate multiple-choice and open-ended questions, tailored to the difficulty of the original challenges. Finally, it evaluates LLM performance on these generated questions using various metrics.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -r requirements.txt.
Prerequisites: Python 3.8+, an OpenAI API key (or equivalent for other providers).
Setup: Set your API key via environment variables (e.g., export OPENAI_API_KEY="your-openai-api-key").
Resources: Pre-collected data is available, but scraping may require time and network resources. LLM inference for extraction and evaluation will incur API costs.
Documentation: The README provides a detailed overview and usage examples.

Highlighted Details

Supports multiple LLM providers (OpenAI, Replicate, etc.).
Includes a dataset of 13,000+ CTF challenges across 6 categories (Web, Pwn, Reverse, Crypto, Forensics, Misc) from 2019-2024.
Enables customization of LLM prompts and data processing workflows.
Offers a full pipeline for research, including data collection, knowledge extraction, question generation, and model evaluation.

Maintenance & Community

The project is associated with a research paper and welcomes contributions via pull requests. Development setup instructions are provided, including pre-commit hooks.

Licensing & Compatibility

This project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The framework relies heavily on LLM APIs, which can incur costs and are subject to the performance and availability of those services. The quality of extracted knowledge and generated questions is dependent on the LLM used and the quality of the scraped write-ups.

CTFKnow by tszdanger

Explore Similar Projects

prompt-hacker-collections by yunwei37

AutoAudit by ddzipp

Awesome-LLM4Security by liu673

ai-goat by dhammon

HackSynth by aielte-research

SecGPT by ZacharyZcR

awesome-gpt-security by cckuailong

Awesome-LLM4Cybersecurity by tmylla

vulnhuntr by protectai

www-project-top-10-for-large-language-model-applications by OWASP

SecGPT by Clouditera

PurpleLlama by meta-llama