Audio adversarial examples
-- generation and detection of attacks on deep speech recognition systems
Description:
This
page contains
speech adversarial examples generated through attacking deep speech
recognition systems, together with the Python source code for detecting
these adversarial examples. Both white-box and black-box targeted
attacks are included. For details, refer to [1].
Databases of adversarial examples:
Here are the two speech datasets of adversarial examples:
dataset for white-box attacks (198 MB) and
dataset for black-box attacks (88 MB).
Both
normal and adversarial examples are included and each dataset contains
equal number of normal and adversarial examples. The dataset for
white-box
attacks was built upon the Mozilla Common Voice dataset [2] and the dataset for black-box attacks was built upon the Google Speech Command dataset [3]. Both the Mozilla Common Voice dataset and the Google Speech Command dataset are under the Creative Commons license, so are the adversarial datasets.
Source code:
Source code for
adversarial attack detection based on convolutional neural networks is available here: source code in Python.
Examples of adversarial examples:
Here are several adversarial speech examples:
- normal source example "yes"
- adversarial white-box example and adversarial black-box example
that are both recognized by corresponding speech recognizers as the
designed target "right", while a human clearly hears a "yes".
- normal source example "it's called the principle of favorability"
- adversarial white-box example that is recognized by corresponding speech recognizer as the
designed target "switch off wifi connection", while a human clearly hears "it's called the principle of favorability".
- normal source example "it was the pure language of the world"
- adversarial white-box example that is recognized by corresponding speech recognizer as the
designed target "switch off wifi connection", while a human clearly hears "it was the pure language of the world".
Citations:
[1] Saeid Samizade, Zheng-Hua Tan, Chao Shen and Xiaohong Guan, ÒAdversarial Example Detection by Classification for Deep Speech RecognitionÓ, arXiv preprint arXiv:1910.10013 (2019).
[2] Mozilla common voice dataset. [Online]. Available: https://voice.mozilla.org/en/datasets, accessed 2019.
[3] Pete Warden, ÒSpeech commands: A dataset for limited- vocabulary speech recognition,Ó CoRR, vol. abs/1804.03209, 2018. [Online]. Available: http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz, accessed 2019.
Department of
Electronic
Systems, Aalborg University, Denmark
E-mail: zt@es.aau.dk