# Adversarial Machine Learning Against Voice Assistant Systems

## Project Objective

This project aims to study the security of voice assistant systems under adversarial machine learning. Adversarial learning algorithms can generate adversarial audio samples to serve as the input of voice assistant systems, so as to fool the machine learning models in the system. In this project, we will focus on the white-box attack in the digital domain by generating adversarial samples using adversarial machine learning algorithms to attack a speaker recognition system based on X-Vector. If time allows, we will further enhance the robustness of the attack by simulating room impulse response and conduct over-the-air attack.

— Weekly plan

## Tutorials

*Week 1

- Generating Adversarial Samples in Keras: https://medium.com/mindboard/generating-adversarial-samples-in-keras-tutorial-f881ac836246
- Tensorflow - Adversarial Example using FGSM: https://www.tensorflow.org/tutorials/generative/adversarial_fgsm
- Generating Adversarial Samples in Keras: https://medium.com/analytics-vidhya/implementing-adversarial-attacks-and-defenses-in-keras-tensorflow-2-0-cab6120c5715

*Week 2

- Python tutorial: https://www.w3schools.com/python/
- How to run Python code: https://www.knowledgehut.com/blog/programming/run-python-scripts
- Jupyter notebook tutorial: https://www.dataquest.io/blog/jupyter-notebook-tutorial/
- Video tutorial (Optional): Neural Networks and Deep Learning: https://www.coursera.org/learn/neural-networks-deep-learning

*Week 3

- Introduction of Keras: https://en.wikipedia.org/wiki/Keras
- Basic Classification: Classify Images of Clothing: https://www.tensorflow.org/tutorials/keras/classification
- Simple Neural Networks in Python: https://towardsdatascience.com/inroduction-to-neural-networks-in-python-7e0b422e6c24
- TensorFlow Neural Network Tutorial (optional): https://stackabuse.com/tensorflow-neural-network-tutorial/

*Week 4

- Mel-Frequency Cepstral Coefficients (MFCC): https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf
- Time Delay Neural Network: https://neuron.eng.wayne.edu/tarek/MITbook/chap5/5_4.html
- Paper: Phoneme Recognition Using Time-Delay Neural Networks (Optional): https://www.orbit-lab.org/attachment/wiki/Other/Summer/2020/AdvML/Phoneme%20Recognition%20Using%20Time-Delay%20Neural%20Networks%20.pdf

*Week 5

- 1D Convolutional Layer (implementation method for TDNN): https://missinglink.ai/guides/keras/keras-conv1d-working-1d-convolutional-neural-networks-keras/
- Pooling Layer: https://d2l.ai/chapter_convolutional-neural-networks/pooling.html#maximum-pooling-and-average-pooling
- Statistical Pooling: https://www.tensorflow.org/api_docs/python/tf/nn/moments
- Probabilistic Linear Discriminant Analysis for Inferences About Identity: https://www.orbit-lab.org/attachment/wiki/Other/Summer/2020/AdvML/Probabilistic%20Linear%20Discriminant%20Analysis%20for%20Inferences%20About%20Identity.pdf

*Week 6

- Introduction of Fast Gradient Sign Method (FSGM): https://towardsdatascience.com/perhaps-the-simplest-introduction-of-adversarial-examples-ever-c0839a759b8d#:~:text=Fast%20Gradient%20Sign%20Method%20(FGSM)&text=In%20essence%2C%20FGSM%20is%20to,small%20number%20via%20max%20norm.
- Adversarial example using FGSM: https://www.tensorflow.org/tutorials/generative/adversarial_fgsm
- Cross-entropy cost function: https://eng.libretexts.org/Bookshelves/Computer_Science/Book%3A_Neural_Networks_and_Deep_Learning_(Nielsen)/03%3A_Improving_the_way_neural_networks_learn/3.01%3A_The_cross-entropy_cost_function

## Reading Material

- Hidden voice commands
- CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition
- Audio Adversarial Examples Targeted Attacks on Speech-to-Text
- Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
- Practical Adversarial Attacks Against Speaker Recognition Systems
- X-Vectors: Robust DNN Embeddings For Speaker Recognition

## Week 1 Activities

- Get ORBIT/COSMOS account and familiarize oneself with the testbed procedures

## Week 2 Activities

- Get familiar with Python language.

— Install Python environment

— Use Jupyter Notebook to run Python code samples

- Learn the concept of deep learning and deep neural networks.

— Slides: Neural Network Basics of Energy-Efficient Machine Learning System

— Video tutorial (Optional): Neural Networks and Deep Learning by Andrew Ng (Recommended chapters: Week 2: Logistic Regression as a Neural Network, Week 3: Shallow Neural Network)

## Week 3 Activities

- Setup TensorFlow and Keras environment using Anaconda

— Follow the tutorial “Basic classification: Classify Images of Clothing” to get familiar with TensorFlow and Keras

— Read the tutorial “Simple Neural Networks in Python” (code implementation not required)

— Read the “TensorFlow Neural Network Tutorial” and run the code implementation (optional)

- Read the paper “X-Vectors: Robust DNN Embeddings for Speaker Recognition” (IEEE ICASSP 2018).

— Try to understand the workflow of x-vector and learn background knowledge, such as the application of x-vector, concept of the phoneme, data augmentation, etc. (try to learn TDNN and MFCC if time allows)

## Week 4 Activities

- Learn the feature extraction process of MFCC and extract the MFCC feature using TensorFlow based on the sample code.

- Understand the speaker recognition system (X-Vector) and time-delay neural network.

— Understand the concept of Time Delay Neural Network (TDNN).

— (Optional) Learn the concept of Convolutional Neural Network (CNN) and find the similarities between CNN and TDNN. (Note: the implementation of TDNN will be based on one-dimensional CNN.)

## Week 5 Activities

- Learn the steps of using X-Vector model for speaker recognition

— Understand the 1D convolutional layer and use it to implement TDNN

— Understand Statistical Pooling layer

— Classify speakers using Probabilistic Linear Discriminant Analysis (PLDA): trained with the embeddings from the X-vector

- Study the Python code samples for X-Vector and implement X-Vector using TensorFlow

- Read the paper: Practical Adversarial Attacks Against Speaker Recognition Systems (HotMobile’20) and get familiar with the untargeted attack

## Week 6 Activities

- Develop an untargeted attack that can generate adversarial samples based on the sample code and tutorial.

— Understand Fast Gradient Sign Method (FSGM)

— Understand cross-entropy as cost function

- Evaluate the performance of the adversarial samples on the voice assistant system (X-Vector).

## Project Website

### Attachments (12)

- Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition.pdf (368.2 KB ) - added by 2 months ago.
- Hidden voice commands.pdf (743.3 KB ) - added by 2 months ago.
- Audio Adversarial Examples Targeted Attacks on Speech-to-Text.pdf (587.5 KB ) - added by 2 months ago.
- Commandersong A systematic approach for practical adversarial voice recognition.pdf (824.7 KB ) - added by 2 months ago.
- Practical Adversarial Attacks Against Speaker Recognition Systems.pdf (1.9 MB ) - added by 2 months ago.
- hidden voice command code readme.docx (15.8 KB ) - added by 2 months ago.
- Weekly plan for adversarial machine learning against voice assistant systems.docx (7.5 KB ) - added by 2 months ago.
- lec5_neural network basic.pdf (2.0 MB ) - added by 2 months ago.
- X-VECTORS- ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION.pdf (189.6 KB ) - added by 2 months ago.
- Phoneme Recognition Using Time-Delay Neural Networks .pdf (1.2 MB ) - added by 7 weeks ago.
- Weekly plan for adversarial machine learning against voice assistant systems.2.docx (13.0 KB ) - added by 7 weeks ago.
- Probabilistic Linear Discriminant Analysis for Inferences About Identity.pdf (605.3 KB ) - added by 6 weeks ago.

**Note:**See TracWiki for help on using the wiki.