RealMythos  by tszdanger

Open cybersecurity reasoning stack for vulnerability analysis and PoC generation

Created 1 month ago
359 stars

Top 77.9% on SourcePulse

GitHubView on GitHub
Project Summary

RealMythos aims to publicly reconstruct Claude Mythos as an open cybersecurity reasoning stack, addressing the concentration of advanced security tools behind proprietary gates. It provides datasets, models, and infrastructure for researchers, defenders, and educators to build, inspect, and improve security reasoning systems, promoting fairness and broader usability.

How It Works

The project follows a staged, layered approach to reconstruct a security reasoning architecture: real vulnerability data -> reasoning dataset -> open security reasoning model -> reproducible environments -> multi-agent trace collection. This deliberate staging allows the community to inspect and improve each layer, starting with data and progressing to models and infrastructure, fostering transparency and reproducibility.

Quick Start & Requirements

  • Stage 1 Dataset: Available on Hugging Face (RealMythos/RealMythosReasoning).
  • Stage 2 Model: pocwriter-v1 available on Hugging Face (RealMythos/pocwriter-v1).
  • Reproducibility Code: Pipeline for Stage 1 dataset is in stage1-dataset/pipeline/.
  • Prerequisites: Python environment for pipeline execution. The pocwriter-v1 model is a fine-tune of Qwen/Qwen3.5-9B.
  • Links:

Highlighted Details

  • Stage 1 Dataset: Contains 6,159 CVE-linked C/C++ security reasoning records derived from real-world vulnerabilities, focusing on root cause, impact, and Proof-of-Concept (PoC) generation. It emphasizes patch-unaware reasoning and includes quality signals.
  • Stage 2 Model (pocwriter-v1): A full-parameter supervised fine-tune of Qwen3.5-9B on the Stage 1 dataset, specialized for C/C++ vulnerability analysis and assisting with PoC drafting.
  • Staged Release Philosophy: Artifacts are released in layers (data, models, environments, traces) to enable community inspection, reproduction, and iterative improvement.
  • Research Lineage: Builds upon prior work in real-world vulnerability collection (Reef) and API-guided dataset synthesis.

Maintenance & Community

RealMythos is an independent open project developed by its authors in their personal capacity, not affiliated with Anthropic or other Mythos-branded projects. Key contributors include Zongjie Li (Project lead), Liwen Wang, Chaozheng Wang, and Zimo Ji, with affiliations to HKUST and CUHK. No formal community channels (e.g., Discord, Slack) are indicated.

Licensing & Compatibility

The Stage 2 model (pocwriter-v1) is released under the Apache-2.0 license, which is permissive for commercial use. The license for the dataset is not explicitly stated but is intended for open research.

Limitations & Caveats

The pocwriter-v1 model is an early public checkpoint; its outputs require manual verification. Generated PoCs should strictly be used in authorized and controlled environments. Stages 3 (Reproducible software environments) and 4 (Scaffold-based trace collection) are still under development.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
292 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.