kong by amruth-sn

Agentic reverse engineer for binaries

Created 4 months ago

1,041 stars

Top 35.4% on SourcePulse

Project Summary

The Kong project addresses the significant challenge of reverse engineering stripped binaries by automating the recovery of crucial context such as function names, type information, and symbols. It is designed for reverse engineers and security researchers, offering a substantial benefit by accelerating the analysis of obfuscated code through advanced LLM orchestration and a novel agentic deobfuscation pipeline.

How It Works

Kong employs a sophisticated five-phase pipeline orchestrated by a supervisor: triage, analysis, cleanup, synthesis, and export. It constructs rich context windows from Ghidra's program database, incorporating decompilation, cross-references, and data flow, before querying Large Language Models (LLMs). Functions are analyzed in a bottom-up order based on the call graph, ensuring that callers benefit from the already-resolved context of their callees. A unique agentic deobfuscation pipeline is integrated to identify and remove various obfuscation techniques. The synthesis phase then unifies naming conventions across the binary and synthesizes struct definitions, with all recovered information exported back into Ghidra's program database.

Quick Start & Requirements

Installation: Install via pip: uv pip install kong-re. Alternatively, clone from source and use uv sync.
Prerequisites: Python 3.11+, uv package manager, Ghidra (NSA's reverse engineering framework), JDK 21+, and at least one LLM API key (Anthropic Claude or OpenAI GPT-4o).
Setup: Run the interactive kong setup wizard for initial configuration.
Analysis: Execute analysis with kong analyze ./path/to/stripped_binary.
Links: GitHub Repo, PyPI

Highlighted Details

Fully Autonomous Pipeline: A single command initiates the complete analysis workflow from triage to export.
In-Process Ghidra Integration: Leverages PyGhidra and JPype for direct Ghidra database manipulation, avoiding RPC overhead.
Call-Graph-Ordered Analysis: Functions are processed bottom-up, ensuring callees' resolved context is available for callers.
Agentic Deobfuscation: Features a first-of-its-kind pipeline designed to identify and remove obfuscation techniques.
Eval Framework: Includes a built-in harness for scoring analysis output against ground-truth source code.
Multi-Provider LLM Support: Integrates seamlessly with Anthropic and OpenAI models, including cost tracking.

Maintenance & Community

The project is actively maintained by amruth-sn. Community engagement is encouraged through GitHub Issues. The author is also reachable via X (formerly Twitter) and LinkedIn for further discussion.

Licensing & Compatibility

Kong is licensed under the Apache License 2.0. This license is compatible with Ghidra's licensing and explicitly permits commercial use.

Limitations & Caveats

Confidence levels for architecture support vary, with lower confidence noted for Rust and Go binaries on ARM, MIPS, and PowerPC architectures. The size of the binary, LLM costs, and time to completion scale positively with the number of functions, while analysis confidence scales negatively.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

354 stars in the last 30 days