fuzz4all by fuzz4all

Fuzzer using LLMs for universal input generation

Created 1 year ago

314 stars

Top 86.1% on SourcePulse

Project Summary

Fuzz4All is a universal fuzzing framework that leverages Large Language Models (LLMs) to generate diverse and realistic inputs for various programming languages. It is designed for researchers and developers seeking to improve software robustness by exploring a wide range of input possibilities, particularly for languages where traditional fuzzing techniques may be less effective.

How It Works

Fuzz4All utilizes LLMs as its core input generation and mutation engine. It employs a novel autoprompting technique to create LLM prompts specifically tailored for fuzzing tasks. A key component is its LLM-powered fuzzing loop, which iteratively refines these prompts based on feedback, enabling the generation of novel and effective test cases for arbitrary inputs and languages.

Quick Start & Requirements

Installation: Recommended via Docker image (https://doi.org/10.5281/zenodo.10456883). Alternatively, use conda create -n fuzz4all python=3.10, conda activate fuzz4all, pip install -r requirements.txt, and pip install -e ..
Prerequisites: Requires python=3.10, CUDA for GPU acceleration, and an OpenAI API key for GPT-4 autoprompting. Supports bigcode/starcoderbase and starcoderbase-1b models.
Configuration: Set environment variables like FUZZING_BATCH_SIZE, FUZZING_MODEL, and FUZZING_DEVICE. Fuzzing targets are configured via YAML files in the configs/ directory.
Execution: Run with python Fuzz4All/fuzz.py --config {config_file.yaml} main_with_config --folder outputs/fuzzing_outputs --batch_size {batch_size} --model_name {model_name} --target {target_name}.
Resources: Requires user-provided target binaries.

Highlighted Details

First fuzzer to universally target many input languages using LLMs.
Novel autoprompting technique for generating effective fuzzing prompts.
Iterative LLM-powered fuzzing loop for prompt refinement.
Supports targeted fuzzing by pointing to specific API/library documentation.

Maintenance & Community

The project is associated with the ICSE'24 paper "Fuzz4All: Universal Fuzzing with Large Language Models." Further details and artifact access are available via a Zenodo link.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as research code from an ICSE'24 paper, suggesting it may be experimental. The use of LLMs, especially for code generation, carries inherent risks of producing potentially harmful code, necessitating cautious execution in sandboxed environments. Support is currently limited to specific StarCoder models, though extensibility is mentioned.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days