NLP attack/analysis research paper (EMNLP 2019)
Top 90.7% on sourcepulse
This repository provides the official code for the EMNLP 2019 paper "Universal Adversarial Triggers for Attacking and Analyzing NLP." It enables researchers and practitioners to generate universal adversarial triggers for various NLP tasks, aiding in model analysis and security assessment. The primary benefit is a standardized method for probing model vulnerabilities and understanding their decision boundaries.
How It Works
The project implements gradient-based attack methods to discover short sequences of words (triggers) that, when appended to an input, cause a target NLP model to misclassify or behave unexpectedly. The core approach involves iteratively updating a trigger sequence to maximize the gradient of the loss function with respect to the trigger's embedding, effectively finding adversarial perturbations. This method is advantageous as it generates a single trigger effective across multiple inputs, unlike traditional adversarial examples.
Quick Start & Requirements
conda create -n triggers python=3.6
), activate it (source activate triggers
), and install dependencies (pip install -r requirements.txt
).snli
or sst
attack examples, which are well-documented and illustrate the methodology.Highlighted Details
attacks.py
.utils.py
(for AllenNLP models).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The code is based on older versions of PyTorch, HuggingFace Transformers, and AllenNLP, which may require compatibility adjustments for current environments. The primary focus is on research and analysis, not necessarily production-ready deployment.
1 year ago
1 day