lm-extraction-benchmark by google-research

Benchmark for training data extraction attacks on language models

Created 3 years ago

297 stars

Top 89.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository hosts a challenge focused on improving targeted data extraction attacks against large language models. It targets researchers and engineers interested in understanding and mitigating privacy risks associated with model memorization, offering a benchmark dataset and evaluation framework.

How It Works

The challenge centers on targeted data extraction, where participants are given a prefix and must predict a specific continuation (suffix) that was present in the model's training data. This approach is favored for its security relevance and ease of evaluation compared to untargeted attacks. The benchmark uses a subset of 20,000 examples from The Pile dataset, specifically designed for extractability and well-defined continuations.

Quick Start & Requirements

Dataset Generation: The load_dataset.py script can generate training data from The Pile dataset using provided CSV pointers.
Dependencies: Requires Python and the GPT-2 tokenizer (identical to GPT-Neo 1.3B).
Resources: Access to The Pile dataset (800GB) is necessary for full data generation.
Documentation: A detailed description of dataset construction is available at detailed_description.pdf.

Highlighted Details

Targeted Extraction: Focuses on recovering specific training examples given a prefix, rather than arbitrary memorized data.
Benchmark Dataset: Utilizes 20,000 curated examples from The Pile, split into 50-token prefixes and suffixes.
Evaluation Metric: Measures recall at 100 incorrect guesses within a 24-hour runtime on specified hardware (similar to a P100 GPU).
Submission: Requires a solution CSV, reproducing code, and a short technical description.

Maintenance & Community

Organizers: Nicholas Carlini, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Milad Nasr, Florian Tramer, and Chiyuan Zhang.
Communication: Questions can be raised via the repository's issue tracker.
Timeline: Key dates include dataset releases, validation round, and final submission deadlines.

Licensing & Compatibility

License: Not explicitly stated in the README, but associated with Google Research. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The benchmark is designed for GPT-Neo 1.3B; querying other models trained on The Pile is disallowed. Participants are cautioned against cheating by directly searching The Pile or the internet for solutions.

Health Check

Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days