synthid-text  by google-deepmind

Reference implementation for text watermarking/detection research paper

created 9 months ago
522 stars

Top 61.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a reference implementation for SynthID Text, a watermarking and detection system for large language model (LLM) generated text, as described in a Nature publication. It is intended for research and reproducibility, offering tools to embed and identify watermarks in text generated by models like Gemma and GPT-2.

How It Works

SynthID Text embeds watermarks by subtly altering the probability distribution of token generation. It uses a configurable hashing function based on keys and sampling tables to influence token selection. Detection involves calculating "G values" for text segments and applying scoring functions (Weighted Mean or Bayesian) to determine the likelihood of a watermark being present. The Bayesian detector requires training on watermarked and unwatermarked data.

Quick Start & Requirements

  • Installation: pip install '.[notebook-local]' for local notebook use, or pip install '.[test]' for testing.
  • Prerequisites: Python 3.x, PyTorch, Hugging Face Transformers. GPU with 16GB+ memory recommended for Gemma 2B/7B models.
  • Demo: A Colab Notebook is provided for end-to-end demonstration.
  • Docs: Official SynthID Text implementation in Hugging Face Transformers for a production-ready version.

Highlighted Details

  • Extends Hugging Face GemmaForCausalLM and GPT2LMHeadModel with watermarking mix-ins.
  • Supports both a simple Weighted Mean detector and a more powerful, trainable Bayesian detector.
  • Includes code for computing G values and various scoring functions.
  • Provides human evaluation data comparing watermarked and unwatermarked text.

Maintenance & Community

This repository is from Google DeepMind. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

  • Software components are licensed under Apache License 2.0.
  • Other materials are licensed under Creative Commons Attribution 4.0 International License (CC-BY).
  • Apache 2.0 license permits commercial use and linking with closed-source projects.

Limitations & Caveats

This implementation is for reference and research reproducibility only, not for production systems. Minor variations may cause fluctuations in detectability compared to the paper. The accumulate_hash() function does not guarantee cryptographic security.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
0
Star History
59 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.