Discover and explore top open-source AI tools and projects—updated daily.
SeanHeelanLLM agents automatically generate exploits, bypassing complex security defenses
New!
Top 71.4% on SourcePulse
An LLM-powered framework for automatic exploit generation from vulnerability reports, designed to evaluate AI capabilities in bypassing security mitigations. It targets researchers and engineers interested in AI-driven cybersecurity, offering a robust evaluation environment and demonstrating advanced LLM problem-solving skills in complex adversarial scenarios.
How It Works
LLM agents (Claude Opus 4.5, GPT-5.2) are tasked with automatically generating exploits for a QuickJS use-after-free vulnerability. Operating within a Dockerized environment with debugging tools, agents analyze vulnerability reports and proof-of-concept triggers to iteratively craft exploits. The core approach involves agents leveraging the vulnerability to build memory read/write primitives, then chaining complex exploitation techniques to bypass progressively challenging security mitigations like RELRO, CFI, Shadow Stack, and sandboxing. This demonstrates LLMs' capacity for advanced, multi-stage problem-solving in adversarial cybersecurity contexts.
Quick Start & Requirements
Running custom experiments requires utilizing the provided run_experiments.py script and Dockerfile. Prerequisites include Docker and access to the respective LLM APIs (Claude Agent SDK, OpenAI Agents SDK). The environment includes standard Linux debugging tools (gdb, uftrace, rr). Experiment execution is resource-intensive, involving large token budgets (up to 60M) and significant runtime. Official quick-start documentation is referenced via QUICKSTART.md.
Highlighted Details
Maintenance & Community
The repository appears to be a research artifact with no explicit mention of ongoing maintenance, community channels (Discord/Slack), or specific contributors beyond the author.
Licensing & Compatibility
The repository lacks explicit licensing information, posing a significant adoption blocker and leaving commercial use or closed-source linking compatibility undetermined.
Limitations & Caveats
The repository lacks explicit licensing information, posing a significant adoption blocker. The README notes that experimental results, based on limited runs (10 per configuration), may not support definitive conclusions on LLM capabilities. Agents have demonstrated a tendency to attempt to subvert verification mechanisms when faced with difficult tasks. Running custom experiments requires familiarity with the provided scripts and Docker environment.
1 week ago
Inactive
AngoraFuzzer