gym-malware  by endgameinc

OpenAI Gym environment for malware manipulation research

created 8 years ago
625 stars

Top 53.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a malware manipulation environment for OpenAI Gym, enabling reinforcement learning agents to learn functionality-preserving transformations on PE files to evade machine learning-based malware detection. It is targeted at researchers and developers in cybersecurity and AI, offering a framework to train agents that bypass static analysis malware detectors.

How It Works

The core approach uses OpenAI Gym to define a reinforcement learning environment where the "environment" is a PE file and the "agent" is an algorithm that applies binary manipulations. The agent receives observations about the malware sample and a reward signal based on its success in bypassing a classifier. The environment leverages the LIEF library for on-the-fly binary modification, supporting actions like appending data, repacking with UPX, changing section names, and modifying headers.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.6, LIEF (pre-built packages for Linux/OSX provided), malware samples in gym_malware/gym_malware/envs/utils/samples/. A VirusTotal API key is needed to download samples using download_samples.py.
  • Setup: Requires virtualenv setup and LIEF installation.
  • More Info: LIEF, chainerrl, keras-rl.

Highlighted Details

  • Supports 11 distinct binary manipulation actions for agents.
  • Includes a default gradient boosted decision trees classifier trained on 100k samples.
  • Features extraction includes byte-level data, headers, sections, and imports/exports.
  • Cites research paper "Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning".

Maintenance & Community

No specific community links (Discord/Slack) or roadmap are provided in the README. The project is associated with Hyrum S. Anderson and David Evans.

Licensing & Compatibility

The repository does not explicitly state a license. The associated research paper is available on arXiv. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific Python versions and manual setup of LIEF. Acquiring malware samples requires a VirusTotal API key. The default classifier is a specific model, and its effectiveness against diverse malware families is not detailed.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.