MarkLLM  by THU-BPM

Open-source toolkit for LLM watermarking

created 1 year ago
504 stars

Top 62.6% on sourcepulse

GitHubView on GitHub
Project Summary

MarkLLM is an open-source toolkit designed to simplify the implementation, understanding, and evaluation of Large Language Model (LLM) watermarking techniques. It provides a unified framework for researchers and developers to integrate, visualize, and assess various watermarking algorithms, aiming to enhance the authenticity and traceability of AI-generated text.

How It Works

MarkLLM offers a modular architecture with distinct components for watermarking algorithms, visualization, and evaluation. It supports multiple watermarking methods through a unified interface, allowing users to easily switch between and apply different techniques. The toolkit includes visualization tools to illustrate how watermarks are embedded and detection mechanisms to verify their presence, alongside a comprehensive evaluation suite for assessing detectability, robustness, and text quality.

Quick Start & Requirements

  • Install: pip install markllm
  • Prerequisites: Python 3.9+, PyTorch. For specific algorithms (EXPEdit, ITSEdit), Cython compilation is required.
  • Resources: Requires a CUDA-enabled GPU for optimal performance.
  • Demos & Docs: Google Colab Demo, YouTube Introduction, Website Demo, and extensive examples within the repository (test/, evaluation/examples/).

Highlighted Details

  • Supports 15 distinct LLM watermarking algorithms, including KGW, Unigram, SWEET, UPV, EWD, SIR, X-SIR, DiPmark, Unbiased, TS-Watermark, SynthID-Text, PF Watermark, EXP, EXP-Edit, and ITS-Edit.
  • Features 12 evaluation tools covering detectability, robustness, and text quality, with customizable automated evaluation pipelines.
  • Includes visualization tools for understanding watermark mechanisms, supporting both discrete and continuous visualization types.
  • Offers integration examples with VLLM and provides a comprehensive survey paper on LLM text watermarking.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from the community, indicated by numerous pull requests for new methods and features. Community engagement is encouraged via PRs and potential future community channels.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some demonstration models may have download limitations due to storage constraints. The Cython-based algorithms require a compilation step, which might introduce environment-specific issues. The absence of a clearly stated license requires careful consideration for commercial applications.

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
123 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.