binlex  by c3rb3ru5d3d53c

Binary pattern analyzer for malware research, reverse engineering, and threat hunting

Created 4 years ago
515 stars

Top 60.8% on SourcePulse

GitHubView on GitHub
Project Summary

Binlex is a framework for malware analysis and reverse engineering that treats binary code as "DNA" by breaking it down into hierarchical genetic traits: genomes, chromosomes, allele pairs, and genes. It enables pattern detection, similarity analysis, and threat hunting across large malware datasets, offering a fast, flexible, and extensible alternative to Python-only tools.

How It Works

Binlex disassembles binary files (PE, MachO, ELF) into instructions, basic blocks, and functions, representing these as "genomes." Within each genome, it extracts patterns as "chromosomes," composed of "allele pairs" (bytes) and "genes" (nibbles). This structured representation allows for similarity hashing (Minhash, TLSH, SHA256) and feature extraction, facilitating the identification of commonalities and variations in malware. It also incorporates a vector database with a Graph Neural Network (GNN) for advanced function similarity matching.

Quick Start & Requirements

  • Install: Build from source using cargo build --release. Python bindings require maturin.
  • Prerequisites: Rust toolchain. Python 3 for bindings. GPU recommended for GNN inference.
  • IDA Plugin: Copy plugin directory to ~/.idapro/plugins/ and install requirements. Run docker-compose up -d for the binlex server.
  • Docs: cargo doc --open
  • Links: Official Docs, IDA Plugin Setup, Binlex Server

Highlighted Details

  • Supports Windows, macOS, and Linux for PE, MachO, and ELF formats across AMD64, I386, and CIL architectures.
  • Features multi-threading for efficient analysis and customizable performance via configuration.
  • Includes similarity hashing (Minhash, TLSH, SHA256) and a vector database with GNN for function identification.
  • Offers Rust and Python APIs for custom tooling and integrates with IDA Pro, Ghidra, and Rizin.

Maintenance & Community

  • Developed by c3rb3ru5d3d53c.
  • Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • Licensed under the MIT license, allowing for permissive corporate and personal use without citation requirements for non-open-source AI outputs.

Limitations & Caveats

  • The README mentions that at least 75% of a non-contiguous function's data must be hashable for similarity analysis.
  • GPU is recommended for faster GNN inference but not strictly required.
Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.