Research code for targeted audio adversarial examples
Top 89.1% on sourcepulse
This repository provides code for generating targeted adversarial examples against speech-to-text (STT) systems, specifically targeting DeepSpeech. It enables researchers and security professionals to probe the robustness of STT models by creating audio inputs that are imperceptible to humans but cause misclassification.
How It Works
The project implements optimization-based attacks to find minimal audio perturbations that alter STT output. It leverages a differentiable STT model (DeepSpeech) to compute gradients of the loss function with respect to the input audio, guiding the search for adversarial perturbations. This gradient-based approach allows for targeted attacks, aiming to transform speech into a specific, incorrect transcription.
Quick Start & Requirements
docker build -t aae_deepspeech_093_gpu .
docker run --gpus all -v /absolute/path/to/data:/data -v /absolute/path/to/tmp:/tmp -ti aae_deepspeech_093_gpu
Highlighted Details
a8d5f675ac8659072732d3de2152411f07c7aa3a
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README explicitly states "THIS IS NOT THE CODE USED IN THE PAPER," suggesting potential discrepancies in results or methodology. Reproducing the paper's exact setup is described as potentially difficult due to dependency management ("dependency hell"). GPU support is mandatory for the provided Docker image, and Windows/Mac GPU usage with Docker is noted as potentially unsupported.
3 years ago
1 day