MarkLLM by THU-BPM

Open-source toolkit for LLM watermarking

Created 1 year ago

712 stars

Top 48.2% on SourcePulse

Project Summary

MarkLLM is an open-source toolkit designed to simplify the implementation, understanding, and evaluation of Large Language Model (LLM) watermarking techniques. It provides a unified framework for researchers and developers to integrate, visualize, and assess various watermarking algorithms, aiming to enhance the authenticity and traceability of AI-generated text.

How It Works

MarkLLM offers a modular architecture with distinct components for watermarking algorithms, visualization, and evaluation. It supports multiple watermarking methods through a unified interface, allowing users to easily switch between and apply different techniques. The toolkit includes visualization tools to illustrate how watermarks are embedded and detection mechanisms to verify their presence, alongside a comprehensive evaluation suite for assessing detectability, robustness, and text quality.

Quick Start & Requirements

Install: pip install markllm
Prerequisites: Python 3.9+, PyTorch. For specific algorithms (EXPEdit, ITSEdit), Cython compilation is required.
Resources: Requires a CUDA-enabled GPU for optimal performance.
Demos & Docs: Google Colab Demo, YouTube Introduction, Website Demo, and extensive examples within the repository (test/, evaluation/examples/).

Highlighted Details

Supports 15 distinct LLM watermarking algorithms, including KGW, Unigram, SWEET, UPV, EWD, SIR, X-SIR, DiPmark, Unbiased, TS-Watermark, SynthID-Text, PF Watermark, EXP, EXP-Edit, and ITS-Edit.
Features 12 evaluation tools covering detectability, robustness, and text quality, with customizable automated evaluation pipelines.
Includes visualization tools for understanding watermark mechanisms, supporting both discrete and continuous visualization types.
Offers integration examples with VLLM and provides a comprehensive survey paper on LLM text watermarking.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from the community, indicated by numerous pull requests for new methods and features. Community engagement is encouraged via PRs and potential future community channels.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some demonstration models may have download limitations due to storage constraints. The Cython-based algorithms require a compilation step, which might introduce environment-specific issues. The absence of a clearly stated license requires careful consideration for commercial applications.

MarkLLM by THU-BPM

Explore Similar Projects

MarkDiffusion by THU-BPM

Awesome-LLM-Watermark by hzy312

tiger by tigerlab-ai

ingest by sammcj

can-ai-code by the-crypt-keeper

code-eval by abacaj

Awesome-LLM-Eval by onejune2018

Awesome-Code-LLM by huybery

OmniDocBench by opendatalab

mle-bench by openai

alpaca_eval by tatsu-lab

SWE-bench by SWE-bench