lm-extraction-benchmark  by google-research

Benchmark for training data extraction attacks on language models

created 2 years ago
292 stars

Top 91.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository hosts a challenge focused on improving targeted data extraction attacks against large language models. It targets researchers and engineers interested in understanding and mitigating privacy risks associated with model memorization, offering a benchmark dataset and evaluation framework.

How It Works

The challenge centers on targeted data extraction, where participants are given a prefix and must predict a specific continuation (suffix) that was present in the model's training data. This approach is favored for its security relevance and ease of evaluation compared to untargeted attacks. The benchmark uses a subset of 20,000 examples from The Pile dataset, specifically designed for extractability and well-defined continuations.

Quick Start & Requirements

  • Dataset Generation: The load_dataset.py script can generate training data from The Pile dataset using provided CSV pointers.
  • Dependencies: Requires Python and the GPT-2 tokenizer (identical to GPT-Neo 1.3B).
  • Resources: Access to The Pile dataset (800GB) is necessary for full data generation.
  • Documentation: A detailed description of dataset construction is available at detailed_description.pdf.

Highlighted Details

  • Targeted Extraction: Focuses on recovering specific training examples given a prefix, rather than arbitrary memorized data.
  • Benchmark Dataset: Utilizes 20,000 curated examples from The Pile, split into 50-token prefixes and suffixes.
  • Evaluation Metric: Measures recall at 100 incorrect guesses within a 24-hour runtime on specified hardware (similar to a P100 GPU).
  • Submission: Requires a solution CSV, reproducing code, and a short technical description.

Maintenance & Community

  • Organizers: Nicholas Carlini, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Milad Nasr, Florian Tramer, and Chiyuan Zhang.
  • Communication: Questions can be raised via the repository's issue tracker.
  • Timeline: Key dates include dataset releases, validation round, and final submission deadlines.

Licensing & Compatibility

  • License: Not explicitly stated in the README, but associated with Google Research. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The benchmark is designed for GPT-Neo 1.3B; querying other models trained on The Pile is disallowed. Participants are cautioned against cheating by directly searching The Pile or the internet for solutions.
Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.