gym-malware by endgameinc

OpenAI Gym environment for malware manipulation research

Created 8 years ago

629 stars

Top 52.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Clarence Chio

Cofounder of Coverbase, Unit21

Project Summary

This repository provides a malware manipulation environment for OpenAI Gym, enabling reinforcement learning agents to learn functionality-preserving transformations on PE files to evade machine learning-based malware detection. It is targeted at researchers and developers in cybersecurity and AI, offering a framework to train agents that bypass static analysis malware detectors.

How It Works

The core approach uses OpenAI Gym to define a reinforcement learning environment where the "environment" is a PE file and the "agent" is an algorithm that applies binary manipulations. The agent receives observations about the malware sample and a reward signal based on its success in bypassing a classifier. The environment leverages the LIEF library for on-the-fly binary modification, supporting actions like appending data, repacking with UPX, changing section names, and modifying headers.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.6, LIEF (pre-built packages for Linux/OSX provided), malware samples in gym_malware/gym_malware/envs/utils/samples/. A VirusTotal API key is needed to download samples using download_samples.py.
Setup: Requires virtualenv setup and LIEF installation.
More Info: LIEF, chainerrl, keras-rl.

Highlighted Details

Supports 11 distinct binary manipulation actions for agents.
Includes a default gradient boosted decision trees classifier trained on 100k samples.
Features extraction includes byte-level data, headers, sections, and imports/exports.
Cites research paper "Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning".

Maintenance & Community

No specific community links (Discord/Slack) or roadmap are provided in the README. The project is associated with Hyrum S. Anderson and David Evans.

Licensing & Compatibility

The repository does not explicitly state a license. The associated research paper is available on arXiv. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific Python versions and manual setup of LIEF. Acquiring malware samples requires a VirusTotal API key. The default classifier is a specific model, and its effectiveness against diverse malware families is not detailed.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days