Bio-inspired Defense to Adversarial Attacks on Image-Processing Neural Nets

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Introduction

The resurgence of image-processing neural networks (e.g., Convolutional Neural Networks, CNNs, shown in Figure 1) has led to advances in computer vision tasks such as image classification, object detection, image captioning, etc. These neural networks can be easily fooled by adversarial examples, resulting in misclassified images and reduced image classification accuracy. Adversaries are generating by applying small perturbations to the input image, visually indistinguishable to humans, that cause the neural network to misclassify images with high confidence (Figure 2). Such adversaries are particularly troublesome in applications that require high accuracy and security (e.g., self driving vehicles or biometric passwords). Motivated by the idea that the aforementioned adversaries do not fool the human visual system, we explore bio-inspired defenses to adversarial attacks on image-processing neural networks. In particular, we focus on the preprocessing aspect of the human visual system, specifically, the stochastic nature of cone mosaic images formed on the retina and the saccade movements of the eyes. We study the effect of the cone responses and saccade movements on the MNIST handwritten digit dataset and generate two new datasets, MNIST_CONE and MNIST_SACCADES. Our results show that saccade movements of the eyes seem to help improve a neural network's resistance to adversarial attacks after adversarial training, whereas stochastic cone responses provided minimal improvements.

Background

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are used in a variety of image processing tasks, including image classification, object detection, etc. CNNs have a layered structure, where each layer is followed by a nonlinearity (e.g., rectified linear unit, or ReLU). Without the nonlinearity unit, the neural network can be represented by a mathematically equivalent single linear layer that is a linear combination of all the layers of the neural network. In CNNs, the layers are typically convolutional or fully connected. Convolutional layers use spatial convolution kernels to learn spatially invariant features that can be generalized across the whole image. Fully connected layers have weights that store information about every connection between all the input and output nodes. Mathematically, this can be represented as matrix multiplication. These layers are typically used to extract features and map the shape of the previous layer to the desired dimension (e.g., 10 for the 10 output classes in MNIST). An example of a CNN is shown below in Figure 1.

Fig.1 An example of a Convolution Neural Network for MNIST digit recognition. [1] http://www.mriaz.me/

Adversary for Neural Nets

In this section we answer the question: 'What is an adversarial image?' Adversarial images are formed by adding small but carefully constructed perturbations to the inputs that force the model to misclassify the input with high confidence. The abundance of successful adversarial inputs raises the pressing question: 'Are Neural Nets really learning features?'. This undermines their utility in critical situations. This has led to a recent surge in trying to generate and understand defenses against such adversaries. Figure 2 summarizes the problem with adversarial inputs by showing a famous example from [2]. Here the network misclassifies a panda to a gibbon with very high confidence by addition of a small, imperceptible noise. Figure 3 shows another adversary for MNIST where network misclassifies 3 and 7 images in right panel 99% of the time after addition of perturbations to the original inputs shown in left panel.

Fig.2 Adversary Generation [2] https://arxiv.org/abs/1412.6572
Fig.3 Adversaries on MNIST [2] https://arxiv.org/abs/1412.6572

Basis for Bio-Inspired Defense

The most puzzling part of adversarial examples is that they fool neural networks, causing them to misclassify images with very high confidence, even though the human visual system is not fooled (e.g., the noise that causes the network to misclassify is perceptually insignificant). This led us to ask the question: 'What component of human visual system is robust to these advesaries?' Is it the preprocessing of images done by eye and photoreceptors or the brain’s biological neural network (LGN/V1), which might have a different processing architecture than CNNs?

We started with the following hypothesis:

HVS does preprocessing to avoid adversaries.

Fig.4 Project Schematic

To test this hypothesis, we came up with the experiment summarized in Figure 4. We introduced a bio-inspired preprocessing block before the images pass through the neural network in order to determine whether or not the bio-inspired preprocessing is robust to adversaries. To be more concrete we tried to answer the following two questions about bio-inspired preprocessing:

  1. Are stochastic cone mosaic responses responsible? The rationale behind this was that perhaps the neural network is immune to small perturbations at the input, due to stochastic sampling done by cones in the eye.
  2. Are saccades responsible for higher resilience? The rationale behind this was [2] showed that linearity of the networks was one of the main reasons why adversarial examples exist and recent defenses against adversaries have shown that adding a highly non-linear (but trainable) layer as the first layer in the neural network helps improves network robustness against adversaries. Saccades are done by the human visual system to intelligently subsample the input image space. Since saccades followed by cone responses are intelligently subsampling the image and then creating a stochastic output, this acts as an additional added non-linearity in the model.

In the following sections we explain our methodology, experiment design details and results we got while trying to address the above two questions.

Methods

In this section, we discuss the tools and describe our methodology for a) generating stochastic cone responses and saccades, and b) neural networks models and the method of generating adversarial images.

Dataset and Software Packages

  • We used the MNIST dataset. The MNIST dataset is a collection of grayscale handwritten digit images. Each example is a digit from 0 to 9. It has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image of 28 × 28 pixels, where each pixel is encoded with 8 bits.
  • We used The Image System Engineering Toolbox for Biology (ISETBIO) to simulate the bio-inspired preprocessing. ISETBIO is a Matlab toolbox designed to facilitate, integrate and document vision science calculations, developed by VISTA Lab at Stanford. We generated 2 new datasets using this toolbox:
    • MNIST_CONE: This dataset is a collection of stochastic cone responses generated with the MNIST dataset images.
    • MNIST_SACCADES: This dataset is a collection of stochastic cone responses of the saccades generated with the MNIST dataset images.
  • We used the CleverHans library to generate the adversaries and ran all the experiments using TensorFlow. CleverHans is a Python library to benchmark machine learning systems' vulnerability to adversarial examples and reproduces the results of [2] which formed the baseline results against which we benchmarked our results. All our models were essentially running the code in this repository, with appropriate parameters.

Generating Cone Mosaic Images: MNIST_CONE DATASET

We used the Cone Mosaic tutorial available on the ISETBIO GitHub to generate the MNIST_CONE dataset. Starting from MNIST, each image was used to create a scene of (0.19° × 0.19°) field of view (FOV) and simulated using ISETBIO scene object. This was done to ensure that the final images in this dataset are also of the same size as MNIST for comparison of models. The scenes were then converted to an input to the retina using the optical image (OI) object. This object is passed to a coneMosaic (coneMosaic) object and the scene is set to ensure that the cone mosaic sees the full FOV. Since cone responses are stochastic, we generated 100 isomerizations for this input and calculated the mean cone absorption. This data was then rescaled to bring it back to 0-255 range so that each pixel is encoded back to 8 bits and this serves as a new image in the MNIST_CONE dataset. Figure 5 summarizes the pipeline for generating the MNIST_CONE dataset.

Fig.5 MNIST_CONE Dataset Generation pipeline

Generating Saccades: MNIST_SACCADES DATASET

To generate the saccades we used the property that humans focus more on the center of the image and high-frequency spatial regions, which makes sense as you would like to focus the high density cone region to sample the high-frequency data in the image. Hence, we always used the center pixel as first saccade center and the remaining (n - 1) saccade centers (n = 10 in our experiments). We first passed the image through a high pass-filter by the Sobel edge detection algorithm and chose (n - 1) random points where there was an edge as the remaining (n - 1) saccade centers. Each saccade image was chosen to be 15 × 15 and then zero-padded to 28 × 28, essentially changing the retinal FOV to sample only 15 × 15 blocks instead of 28 × 28 as done in MNIST_CONE. All these n images are then passed through the same pipeline as described in the MNIST_CONE section to generate mean cone absorptions (with one minor difference that we used 10 isomerizations instead of 100 in interest of compute time). Finally all n images were concatenated in the channels dimension to convert a 28 × 28 MNIST image to a 28 × 28 × n matrix, representing n saccadic images, called MNIST_SACCADES throughout report. Figure 6 summarizes the pipeline for generating the MNIST_SACCADES dataset.

Fig.6 MNIST_SACCADES Dataset Generation pipeline

Adversarial Example Generation: Fast Gradient Sign Method

Adversarial inputs in all the experiments were generated using the Fast Gradient Sign Method (FGSM) as introduced in [2]. The following is a brief overview of FSGM method:

  • Let
  • Then, points to the direction of perturbation of x such that cost increases the most. Intuitively, this will perturb the input in the direction that causes the network misclassify.
  • Hence doing Failed to parse (syntax error): {\displaystyle x+\epsilon \times sign(\nabla_{x}(J(θ, x, y)) } iteratively generates a adversary for the network where is a small perturbation

Figure 7 shows an example of the adversary generation process and misclassification by a network, adapted from [3].

Fig.7 Example of FGSM generated adversarial image. [3] https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb

Experimental Design and Results

Stochastic Cone Responses

The stochastic cone response experiment uses the neural network structure shown in Table 1. The neural network structure is derived from the structure used in the CleverHans tutorial [4]. We train his neural network over 8 epochs using the Adam Optimizer and a learning rate of 0.001. We generate two sets of data, MNIST_CONE_1 and MNIST_CONE_2 using the procedure described in the Methods section. The datasets differ slightly due to the stochastic nature of the cone mosaic response. The results are summarized in Table 2. In all experiments, the neural network is trained on MNIST_CONE_1, though the adversaries used in training and testing can be generated on MNIST_CONE_1 and MNIST_CONE_2. This is representative of the fact that the adversary does not know the initial state of the cone responses. Regular training means that the neural network is not shown adversaries during the training process. Adversarial training means that the neural network is shown adversaries during the training process.

Table 1: Neural Network Design for the Stochastic Cone Response Experiment
Input Weight Output Type, Activation Padding
28 × 28 × 1 8 × 8 × 1 × 16 / Stride 2 14 × 14 × 16 Conv, ReLU SAME
14 × 14 × 16 6 × 6 × 16 × 32 / Stride 2 9 × 9 × 32 Conv, ReLU VALID
9 × 9 × 32 5 × 5 × 32 × 32 5 × 5 × 32 Conv, ReLU VALID
1 × 800 800 × 10 1 × 10 FC, SoftMax
Table 2: Results for the Stochastic Cone Response Experiment
Training Type Adversaries Generated On Test Accuracy on Adversaries
Regular MNIST_CONE_1 2%
Regular MNIST_CONE_2 8%
Adversarial MNIST_CONE_1 91%
Adversarial MNIST_CONE_2 66%

Saccade Movements

The neural network used in the saccade movements experiment (Table 3) is similar to that used in the cone mosaic response experiment. The key difference is that the input image size is 28 × 28 × n (n = 10), whereas the image size used in the cone mosaic response experiment and traditional MNIST is 28 × 28 × 1. This modification is necessary in order to accommodate the saccade images. As mentioned previously, we sample 10 locations from the input image, padding the output such that each output is 28 × 28 × 1. We then stack these 10 samples across the channels dimension to form a 28 × 28 × 10 saccade image in MNIST_SACCADES. To form the MNIST_REGULAR dataset we use in this experiment, we replicate the input image across the channels such that each of the 10 channels has an identical copy of the input image. This data replication is necessary to ensure that the MNIST_REGULAR baseline experiment and the MNIST_SACCADES experiment use the same neural network structure, so a fair comparison can be made. Similar to the cone response experiment, we train his neural network over 8 epochs using the Adam Optimizer and a learning rate of 0.001. Figure 8 summarizes the test accuracy on legitimate examples as a function of epochs trained, where the network is trained regularly (e.g., no adversaries shown during training). The final test accuracy on MNIST_SACCADES is slightly degraded compared to MNIST_REGULAR, due to the stochastic and noisy nature of the cone mosaic response. This noise is harder to fit during training and results in slightly degraded accuracy on MNIST_SACCADES. Figure 9 shows the test accuracy on adversarial examples as a function of epochs trained, where the network is trained on adversarial examples. MNIST_SACCADES shows a significant improvement in test accuracy on adversarial examples as compared to MNIST_REGULAR. In addition, the network converges faster during adversarial training when compared to that of MNIST_REGULAR.

Table 3: Neural Network Design for the Saccade Movements Experiment
Input Weight Output Type, Activation Padding
28 × 28 × 10 8 × 8 × 10 × 16 / Stride 2 14 × 14 × 16 Conv, ReLU SAME
14 × 14 × 16 6 × 6 × 16 × 32 / Stride 2 9 × 9 × 32 Conv, ReLU VALID
9 × 9 × 32 5 × 5 × 32 × 32 5 × 5 × 32 Conv, ReLU VALID
1 × 800 800 × 10 1 × 10 FC, SoftMax
Fig.8 Performance on MNIST_SACCADES without Adversaries
Fig.9 Performance on MNIST_SACCADES with Adversarial Training

Discussion and Conclusions

Stochastic Cone Response

The classification accuracy is degraded when the neural network is trained on MNIST_CONE as shown in Table 1. This can be attributed to the fact that input images are very noisy, as they span a very small field of view on the retina (0.19° × 0.19°). In the regularly trained MNIST_CONE models (e.g., no adversaries are shown during training), the stochastic nature of the cone response improves the test accuracy on adversarial images. It is important note that the 4× improvement we achieved was exciting, but not impressive, as the baseline accuracy (without stochasticity) was only 2%. During adversarial training (e.g., adversaries are shown during training) on MNIST_CONE, we show that the stochasticity inherent to the cone responses further degrades the accuracy of the neural network. This is due to the fact that stochasticity makes training more difficult (e.g., interrupted gradient flow, harder to fit noisy data, etc.) Thus, in the limits of our experiement we saw that cone stochasticity did not work as a feasible defense.

Saccade Movements

During regular training on MNIST_SACCADES, the test accuracy on legitimate examples in MNIST_SACCADES is slightly degraded (Figure 8). As mentioned before, this can be attributed to the noisy nature of the cone responses in the saccade images. After adversarial training, the test accuracy on adversarial examples improves significantly (Figure 9). We attribute the improved accuracy on several factors. First, humans have multiple opportunities to "look" at an image, versus the "one-shot" approach in a neural network. The saccade movements of the eye allow humans to sample different parts of the image with the region of the retina (e.g., the center) that has the highest density of cones. To generalize, the human visual system can perform inference on a "video" of images, versus a single "still" image. Another factor is that saccade movements are random in nature, which helps the network avoid overfitting on a specific spatial position. This makes adversary generation more difficult. Finally, the saccade movements add to the non-linearities of the system. Nonlinearities have been shown to improve accuracy on adversarial examples in previous works, and we believe that the nonlinearities added by the stochastic nature of the saccade movements helps improve accuracy. In summary, our results show that MNIST_SACCADES seems to have a positive effect on adversarial test accuracy after adversarial training.

Future Work

First, we would like to revisit the cone stochasticity in MNIST_CONE. In particular, we would like to see if cone stochasticity helps with black-box adversarial attacks (e.g., the adversarial attack has no knowledge of the internal structure of the neural network; it only has access to the inputs and outputs). Such an experiment would determine if MNIST_CONE is more robust to adversarial transfer attacks. We feel that this might be useful because the stochastic nature of cone responses can act as a key which only the model knows and adversaries cannot know. This experiment would involve applying the cone mosaic image preprocessing step to the adversarial images themselves (not the input images). Another possibility is to explore new datasets. In particular, MNIST is a relatively simple image classification task, where accuracies can easily saturate to the 95% - 100% range. We would like to apply our saccades idea to the CIFAR10 and CIFAR100 datasets to see if our saccades result can be generalize to other, more difficult image classification tasks. Finally, we would like to explore saccade generation based on saliency map algorithms (versus our current approach of using contrast detection). This would involve adding saliency map algorithms in ISETBIO for saccades generation.

References

  1. Convolutional Neural Networks; Riaz et. al.; http://www.mriaz.me/
  2. Explaining and Harnessing Adversarial Examples; GoodFellow et. al.; https://arxiv.org/abs/1412.6572
  3. TensorFlow Tutorial; Hvass Labs; https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb
  4. CleverHans; Papernot et. al.; http://www.cleverhans.io/

Appendix

  1. Project presentation can be found here: Presentation
  2. Project Code can be found here: Code
  3. Generated datasets can be downloaded from this link:
    1. MNIST_CONE
    2. MNIST_SACCADES
  4. The data is stored as a numpy array. To load the data into Python, use the numpy.load function. For example: X_train = numpy.load('/path/to/saccade/data/MNIST_SACCADE_TRAIN.npy'
  5. After installing the CleverHans repo, navigate to the cleverhans directory and run "python ./cleverhans_tutorials/mnist_tutorial_tf.py --nb_epochs 8 --nb_filters 16"
  6. If you would like to use L2 regularization during training, add the following between lines 123 and 124 of ./cleverhans/utils_tf.py: loss += tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()]) * REGULARIZATION_STRENGTH