Malware detection via ML experiments
Top 82.2% on sourcepulse
This repository explores malware detection and classification using machine learning, targeting security researchers and developers. It provides a framework for feature engineering, selection, and model training on large malware datasets, aiming to improve detection accuracy.
How It Works
The project employs a multi-stage approach to malware analysis. It begins with feature extraction from disassembled binaries (ASM), file metadata, and call graphs. Techniques like chi-squared tests are used for feature selection, prioritizing features with high variance and predictive power. Various classifiers, including ExtraTreesClassifier, XGBoost, and LightGBM, are evaluated, with ensemble methods and stacked models explored for enhanced performance.
Quick Start & Requirements
numpy
, scipy
, scikit-learn
, matplotlib
, jupyter
, pandas
, xgboost
, cython
. For disassembly, binutils
with multi-architecture support is needed.Highlighted Details
Maintenance & Community
The repository appears to be a personal project with no explicit mention of active maintenance, contributors, or community channels.
Licensing & Compatibility
The repository does not explicitly state a license. The included tools (Cuckoo Sandbox, IDA Pro, etc.) have their own licenses, some of which may restrict commercial use or require specific compatibility.
Limitations & Caveats
8 years ago
Inactive