TrafficLLM by ZGC-LLM-Safety

LLM adaptation framework for network traffic analysis

Created 1 year ago

342 stars

Top 80.7% on SourcePulse

Project Summary

TrafficLLM is a framework for adapting open-source Large Language Models (LLMs) to network traffic analysis tasks, enabling robust traffic representation and generalization across detection and generation scenarios. It targets researchers and practitioners in cybersecurity and network analysis seeking to leverage LLMs for understanding and manipulating network data.

How It Works

TrafficLLM employs a three-pronged approach: traffic-domain tokenization to bridge the gap between natural language and network data, a dual-stage tuning pipeline for instruction understanding and task-specific pattern learning, and Extensible Adaptation with Parameter-Effective Fine-Tuning (EA-PEFT) to efficiently adapt models to new traffic environments with minimal parameter updates.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n trafficllm python=3.9), activate it (conda activate trafficllm), and install dependencies (pip install -r requirements.txt). Additional packages (rouge_chinese, nltk, jieba, datasets) are needed for training.
Prerequisites: Base LLM checkpoints (e.g., ChatGLM2-6B, Llama2), raw traffic datasets for preprocessing. GPU acceleration is implied for training and inference.
Setup: Environment setup is straightforward. Training involves multiple stages, including data preprocessing and fine-tuning, which can be resource-intensive.
Resources: Preprint Paper, Tutorials, Adapt2GLM4.

Highlighted Details

Supports adaptation for various traffic analysis tasks including Malware Traffic Detection (MTD), Botnet Detection (BND), and Encrypted VPN Detection (EVD).
Provides over 0.4M traffic data samples and 9K human instructions for LLM fine-tuning.
Includes code for generating pcap files using Scapy for Wireshark compatibility.
Offers EA-PEFT for efficient, modular adaptation to new traffic patterns and tasks.

Maintenance & Community

The project is actively developed, with recent updates including support for GLM4 and packet generation capabilities. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

The repository is released under an unspecified license. The project acknowledges ChatGLM2 and Llama2 as foundational models, implying adherence to their respective licenses. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project is based on specific LLM versions (ChatGLM2, Llama2), and adapting other LLMs may require significant modifications. The README mentions optional training of a custom traffic-domain tokenizer, suggesting that default tokenization might not cover all use cases.

TrafficLLM by ZGC-LLM-Safety

Explore Similar Projects

awesome-AI-system by lambda7xx

LLMTSCS by usail-hkust

vllora by vllora

WireMCP by 0xKoda

ome by sgl-project

uccl by uccl-project

TransGPT by DUOMO

ET-BERT by linwhitehat

SimAI by aliyun

RedGuard by wikiZ

easegress by easegress-io

apisix by apache