synthid-text by google-deepmind

Reference implementation for text watermarking/detection research paper

Created 1 year ago

741 stars

Top 46.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides a reference implementation for SynthID Text, a watermarking and detection system for large language model (LLM) generated text, as described in a Nature publication. It is intended for research and reproducibility, offering tools to embed and identify watermarks in text generated by models like Gemma and GPT-2.

How It Works

SynthID Text embeds watermarks by subtly altering the probability distribution of token generation. It uses a configurable hashing function based on keys and sampling tables to influence token selection. Detection involves calculating "G values" for text segments and applying scoring functions (Weighted Mean or Bayesian) to determine the likelihood of a watermark being present. The Bayesian detector requires training on watermarked and unwatermarked data.

Quick Start & Requirements

Installation: pip install '.[notebook-local]' for local notebook use, or pip install '.[test]' for testing.
Prerequisites: Python 3.x, PyTorch, Hugging Face Transformers. GPU with 16GB+ memory recommended for Gemma 2B/7B models.
Demo: A Colab Notebook is provided for end-to-end demonstration.
Docs: Official SynthID Text implementation in Hugging Face Transformers for a production-ready version.

Highlighted Details

Extends Hugging Face GemmaForCausalLM and GPT2LMHeadModel with watermarking mix-ins.
Supports both a simple Weighted Mean detector and a more powerful, trainable Bayesian detector.
Includes code for computing G values and various scoring functions.
Provides human evaluation data comparing watermarked and unwatermarked text.

Maintenance & Community

This repository is from Google DeepMind. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

Software components are licensed under Apache License 2.0.
Other materials are licensed under Creative Commons Attribution 4.0 International License (CC-BY).
Apache 2.0 license permits commercial use and linking with closed-source projects.

Limitations & Caveats

This implementation is for reference and research reproducibility only, not for production systems. Minor variations may cause fluctuations in detectability compared to the paper. The accumulate_hash() function does not guarantee cryptographic security.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

63 stars in the last 30 days