Tooling for LLM fingerprinting via fine-tuning, enabling model ownership verification
Top 13.9% on SourcePulse
This repository provides tooling for embedding secret fingerprints into Large Language Models (LLMs) via fine-tuning. It enables LLM owners to identify model ownership, protect against unauthorized use, and allows users to verify model authenticity. The primary audience is LLM developers and owners seeking to secure their models.
How It Works
The core approach involves fine-tuning an LLM with specific query-response pairs, creating a unique "fingerprint." This process embeds a secret cryptographic primitive into the model's weights. The advantage is a verifiable, AI-native signature that can identify the model's owner or intended users, offering a method to detect and prove unauthorized usage or distribution.
Quick Start & Requirements
python -m venv env
, source env/bin/activate
), and install dependencies (pip install -r requirements.txt
). Install DeepSpeed from source with DS_BUILD_OPS=1
.deepspeed generate_finetuning_data.py
deepspeed --num_gpus=<NUM_GPUS> finetune_multigpu.py --model_path <model_path>
deepspeed check_fingerprints.py
Highlighted Details
forgetting_regularizer_strength
parameter to balance fingerprint embedding with preventing catastrophic forgetting.Maintenance & Community
The project is associated with the Sentient Foundation and the OML whitepaper. Links to community channels or active development are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README text. This requires further investigation for commercial use or closed-source integration.
Limitations & Caveats
DeepSpeed installation can be complex and may require building from source. The effectiveness and robustness of the fingerprinting method against adversarial attacks or model degradation are not detailed. The README mentions potential conflicts when using DeepSpeed with standard installations.
6 months ago
Inactive