ET-BERT  by linwhitehat

Traffic classifier research paper using transformers

Created 3 years ago
531 stars

Top 59.8% on SourcePulse

GitHubView on GitHub
Project Summary

ET-BERT offers a novel approach to classifying encrypted network traffic by leveraging pre-trained transformer models. It aims to accurately identify traffic types by learning contextual relationships between datagrams within encrypted traffic, benefiting researchers and practitioners in network security and traffic analysis.

How It Works

ET-BERT utilizes a multi-layer attention mechanism to learn inter-datagram contextual and inter-traffic transport relationships from large-scale unlabeled traffic. This pre-training phase allows the model to capture nuanced patterns in encrypted data. Subsequently, it can be fine-tuned on smaller, labeled datasets for specific traffic classification tasks, offering a flexible and efficient method for identifying traffic types.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies.
  • Prerequisites: Python >= 3.6, CUDA 11.4, GPU (Tesla V100S recommended), PyTorch >= 1.1, scapy == 2.4.4, numpy == 1.19.2, tshark, SplitCap. Additional libraries like apex, TensorFlow, WordPiece, and pytorch-crf are needed for specific features.
  • Data: Requires pre-processed datasets or tools to process pcap files.
  • Links: Pre-trained Model

Highlighted Details

  • Based on UER-py framework.
  • Fine-tuning command: python3 fine-tuning/run_classifier.py ...
  • Inference command: python3 inference/run_classifier_infer.py ...
  • Pre-training command: python3 pre-training/pretrain.py ...

Maintenance & Community

  • The work was accepted at The Web Conference (WWW) 2022.
  • Community interaction is encouraged via GitHub issues for questions and discussions.

Licensing & Compatibility

  • The repository does not explicitly state a license. The base UER-py framework is typically Apache 2.0, but this specific project's licensing is unclear.

Limitations & Caveats

The project requires specific hardware (CUDA 11.4, V100S GPU) and a complex set of dependencies, including optional ones like Apex and TensorFlow, which may complicate setup. The licensing status is not clearly defined, which could impact commercial use.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
19 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.