This repository hosts a challenge focused on improving targeted data extraction attacks against large language models. It targets researchers and engineers interested in understanding and mitigating privacy risks associated with model memorization, offering a benchmark dataset and evaluation framework.
How It Works
The challenge centers on targeted data extraction, where participants are given a prefix and must predict a specific continuation (suffix) that was present in the model's training data. This approach is favored for its security relevance and ease of evaluation compared to untargeted attacks. The benchmark uses a subset of 20,000 examples from The Pile dataset, specifically designed for extractability and well-defined continuations.
Quick Start & Requirements
- Dataset Generation: The
load_dataset.py
script can generate training data from The Pile dataset using provided CSV pointers.
- Dependencies: Requires Python and the GPT-2 tokenizer (identical to GPT-Neo 1.3B).
- Resources: Access to The Pile dataset (800GB) is necessary for full data generation.
- Documentation: A detailed description of dataset construction is available at
detailed_description.pdf
.
Highlighted Details
- Targeted Extraction: Focuses on recovering specific training examples given a prefix, rather than arbitrary memorized data.
- Benchmark Dataset: Utilizes 20,000 curated examples from The Pile, split into 50-token prefixes and suffixes.
- Evaluation Metric: Measures recall at 100 incorrect guesses within a 24-hour runtime on specified hardware (similar to a P100 GPU).
- Submission: Requires a solution CSV, reproducing code, and a short technical description.
Maintenance & Community
- Organizers: Nicholas Carlini, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Milad Nasr, Florian Tramer, and Chiyuan Zhang.
- Communication: Questions can be raised via the repository's issue tracker.
- Timeline: Key dates include dataset releases, validation round, and final submission deadlines.
Licensing & Compatibility
- License: Not explicitly stated in the README, but associated with Google Research. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
- The benchmark is designed for GPT-Neo 1.3B; querying other models trained on The Pile is disallowed. Participants are cautioned against cheating by directly searching The Pile or the internet for solutions.