reverse-SynthID by aloshdenny

Reverse-engineering AI image watermarks

Created 6 months ago

4,548 stars

Top 10.7% on SourcePulse

Project Summary

Summary

This project reverse-engineers Google's SynthID watermarking system for AI-generated images, offering tools for detection and removal via signal processing and spectral analysis. It targets researchers and security analysts seeking to understand and mitigate AI-generated content provenance, providing a method to identify and surgically remove invisible watermarks.

How It Works

The core approach leverages signal processing, specifically spectral analysis (FFT) and phase coherence, to identify SynthID watermarks without access to proprietary encoders. A novel "Multi-Resolution SpectralCodebook" captures watermark fingerprints at different resolutions, addressing the key finding that carrier frequencies are resolution-dependent. This codebook enables a V3 bypass that performs surgical, frequency-bin-level removal by auto-selecting the appropriate resolution profile for precise watermark de-embedding.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python virtual environment, and installing dependencies via pip install -r requirements.txt. Building the Multi-Resolution Codebook is a critical step requiring specific datasets: pure black and pure white images generated by Nano Banana Pro, and diverse watermarked content images. These datasets are essential for discovering carrier frequencies, validating phase information, and improving cross-resolution robustness.

Highlighted Details

Detection Accuracy: Achieves 90% accuracy in identifying SynthID watermarks.
V3 Bypass Performance: Delivers significant watermark impact with 75.8% carrier energy drop, 91.4% phase coherence drop, and a PSNR of 43.5 dB.
Resolution Dependency: SynthID embeds carrier frequencies at different absolute positions based on image resolution, necessitating resolution-specific watermark profiles.
Phase Consistency: A fixed phase template across images from the same model serves as a key identifier, with >99.5% cross-image phase coherence at carrier frequencies.
SpectralCodebook: Stores per-resolution watermark fingerprints, enabling auto-selection and direct known-signal subtraction for surgical removal.

Maintenance & Community

The project actively seeks contributions in the form of specific image data (pure black/white from Nano Banana Pro) to expand its watermark codebook, crucial for improving detection and removal capabilities across various resolutions. No explicit community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

No explicit open-source license is provided. The project is designated for "research and educational purposes only," and SynthID is proprietary Google DeepMind technology. Use for misrepresenting AI-generated content as human-created is discouraged, implying potential restrictions on commercial integration or closed-source applications.

Limitations & Caveats

Effective codebook generation and watermark removal are heavily dependent on acquiring specific, curated image datasets (Nano Banana Pro outputs). The project's scope is limited to research and educational use due to the proprietary nature of SynthID. Performance may vary based on the specific Gemini model version and its watermarking implementation.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

221 stars in the last 30 days