An example Blogpost

This is the blogpost for the paper 'Optimized Decision Boundaries as a Defense Against Adversarial Attacks'.

Written by Magdalini Paschali, Sailesh Conjeti and Nassir Navab

Introduction

Deep neural networks (DNNs) are widely used in a variety of applications and have revolutionized the way artificial intelligence is involved in various aspects of every day life. However, it has been proved that DNNs are vulnerable against adversarial examples [1] images that have been maliciously crafted to fool machine learning methods.
A variety of defense mechanisms has been proposed that are widely categorized into proactive, if they try to create more robust models during training, or reactive, when they attempt to defend against an attack on test time. Methods like [2], ensemble of specialists [3], defensive distillation [4] have been proposed as proactive defenses, while mechanisms that include detection of adversarial examples [5] or removal of the adversarial perturbation [6] act as reactive defenses.

The linear nature of the boundaries learned by the original Softmax combined with the significantly small distance of the boundaries between the classes allow adversarial examples to be crafted with minimal distortion and subsequently leave models vulnerable to both black and white box attacks [7].
Besides that, in most cases the Softmax learns to classify the input with extremely high confidence into one of the classes, without leaving room for doubt and uncertainty during the decision making process.

Another important drawback of the Original Softmax is that models with different architectures, even trained for different tasks, have been observed to be vulnerable to the same adversarial examples [8], due to the very small inter-boundary distance of the learned decision boundaries [9].

In this paper we target the decision boundaries learned by deep learning models and combining the large margin Softmax [10] with a robust version of the center loss [11] we show how the decision boundary optimization can increase the robustness against both adversarial attacks and noise and lower the adversarial crafting capabilities of DNNs.

Methodology

Adversarially robust decision boundaries end embeddings should be characterized by large inter-class separability, robustness against outliers, not being prone to overfitting and being able to learn discriminative features.
Currently the original Softmax cannot guarantee that a CNN will maintain the properties mentioned above. Liu et al. [10] introduced a margin term in the Softmax formulation that imposes an angular distance between the learned decision boundaries, subsequently leading to the maximization of the distance between class boundaries and to the construction of more discriminative embedding space. This variation of the original Softmax loss has been proved to outperform its predecessor and has potential to be a valuable asset in the defense against adversarial examples.

Large Margin Softmax

The original Softmax is formulated as:

$\begin{array}{l}\displaystyle L = \frac{1}{N}\sum_i -\log\Big(\frac{e^{f_{yi}}}{\sum_j e^{f_j}}\Big)\end{array}$

where $\begin{array}{l}N\end{array}$ is the amount of training data, $\begin{array}{l}j\in[1,K],\end{array}$ where $\begin{array}{l}K\end{array}$ is the number of classes and $\begin{array}{l}\mathbf{f}\end{array}$ is the vector of the class scores. $\begin{array}{l}f_j\end{array}$ can be expressed as the inner product between $\begin{array}{l}x_i\end{array}$ and $\begin{array}{l}W_j\end{array}$ and by using the inner product transformation we have $\begin{array}{l}W_jx_i = \|W_j\| \|x_i\|cos(\theta_j)\end{array}$ . Subsequently the original Softmax can be rewritten as:

$\begin{array}{l}\displaystyle L = \frac{1}{N}\sum_i -\log\Big(\frac{e^{\|W_{y_i}\| \|x_i\|cos(\theta_{y_i})}}{\sum_j e^{\|W_j\| \|x_i\|cos(\theta_j)}}\Big)\end{array}$

In case of binary classification to classes 1 and 2, in order to classify a sample to class 1 the following expression should hold: $\begin{array}{l}W_1^Tx > W_2^Tx \rightarrow \|W_1\| \|x\|cos(\theta_1) > \|W_2\| \|x\|cos(\theta_2)\end{array}$ . A margin $\begin{array}{l}m\end{array}$ can be introduced and multiplied with the angle $\begin{array}{l}\theta\end{array}$ between a sample $\begin{array}{l}x\end{array}$ and the classifier of the ground truth class, in this case $\begin{array}{l}W_1\end{array}$ , that imposes a larger distance between the learned boundaries. The above expression after the addition of the margin term is transformed to: $\begin{array}{l}\|W_1\| \|x\|cos(m\theta_1) > \|W_2\| \|x\|cos(\theta_2), 0 \leq \theta_1 \leq \frac{\pi}{m}\end{array}$ . Combining the above expression and the equation for the original Softmax, the Large Margin Softmax (LMSoftmax) is given by:

$\begin{array}{l}\displaystyle L_{LM} = \frac{1}{N}\sum_i -\log\Big(\frac{e^{\|W_{y_i}\| \|x_i\|\psi(\theta_{y_i})}}{e^{\|W_{y_i}\| \|x_i\|\psi(\theta_{y_i})} + \sum_{j\neq y_i} e^{\|W_j\| \|x_i\|cos(\theta_j)}}\Big)\end{array}$

where $\begin{array}{l}\psi(\theta) = (-1)^k cos(m\theta) - 2k, \ \theta \in \Big[\frac{k\pi}{m}, \frac{(k+1)\pi}{m}\Big], \ \text{and } k \in [0, m-1]\end{array}$ .

Margin Variations

In the original LMSoftmax the margin was a constant value during training and ranged between 1 (equivalent to the original Softmax) and 4. The problem that the LMSoftmax tries to solve is harder, since the decisions are more rigorous. Therefore a large margin, for example $\begin{array}{l}m=4\end{array}$ can prevent a CNN from converging properly. To combat this issue we propose a variable value for $\begin{array}{l}m\end{array}$ that is being scaled gradually as the training proceeds.

Initially the margin value is $\begin{array}{l}m=1\end{array}$ so that the convergence is achieved and afterwards the margin is increased gradually. In one case, depicted in Fig. 1 the margin is increased in a discrete staircase fashion. However, this approach can lead to instabilities, since every time the margin value is increased, the loss follows the same behavior. Therefore we also propose a continuous increase for $\begin{array}{l}m\end{array}$ after the 10th epoch, so that the loss keeps falling smoothly, while the margin is gradually growing. The continuously increased margin can also be seen in Fig. 1

Figure 1. Discrete and continuous variation of margin value during training

Robust Center Loss

Even though the addition of the large margin already improves the quality of the boundaries learned, the class compactness could be further improved along with the discriminative power of the learned embedding. The center loss [11] can compliment very effectively the LMSoftmax and increase its capabilities and robustness.
The main idea behind the center loss is to compute the class centroid of each cluster and minimize the distance between the samples that belong in a class and the corresponding centroid.

$\begin{array}{l}\displaystyle \mathcal{L}_C = \sum_{i=1}^{m} || \mathbf{x_i} - \mathbf{c_{y_i}}||^{2}_{2}\end{array}$

where $\begin{array}{l}x_i\end{array}$ is the sample, $\begin{array}{l}y_i\end{array}$ the class it belongs to and $\begin{array}{l}c_{y_i}\end{array}$ the corresponding centroid. In the ideal scenario the centroids would be computed over the whole training dataset, but due to computational limitations we substitute them with centroids computed over each batch.

A drawback of the center loss is that the computation of the centroids can be influenced in presence of outliers and noise. Therefore the newly formed class clusters could be misplaced and not maintain a distance that would ensure robustness in case of an adversarial attack.
A reason for this vulnerability against outliers is the use of the $\begin{array}{l}L_2\end{array}$ distance between the class samples and each centroid, which is computed by $\begin{array}{l}{\frac {1}{2}}(x_i-c_{y_i}^{2})\end{array}$ , where $\begin{array}{l}x_i\end{array}$ is a sample and $\begin{array}{l}c_{y_i}\end{array}$ the corresponding class centroid.
To combat this weakness of the $\begin{array}{l}L_2\end{array}$ norm we propose to replace it with the Huber norm [12] that is known to be more robust against outliers and noise and is computed by:

$\begin{array}{l}{\displaystyle L_{\delta }(x_i,c_{y_i})={\begin{cases}{\frac {1}{2}}(x_i-c_{y_i}^{2})&{\textrm {for}}|x_i-c_{y_i}|\leq \delta ,\\\delta \,|x_i-c_{y_i}|-{\frac {1}{2}}\delta ^{2}&{\textrm {otherwise.}}\end{cases}}}\end{array}$

The proposed robust center loss is given by $\begin{array}{l}\mathcal{L}_{RC} = \sum_{i=1}^{m} || \mathbf{x_i} - \mathbf{c_{y_i}}||^{2}_{\text{Huber}}.\end{array}$

The Huber norm does not allow the extreme values of the outliers and noisy samples to contribute significantly to the computation of the centroids therefore the centroids and the learned clusters are not severely influenced by outliers.

The proposed loss function that is proved to be much more robust in case of an adversarial attack consists of the LMSoftmax and the robust center loss:

$\begin{array}{l}\displaystyle J_{\text{comb}} = \lambda_1L_{\text{LM}} + (1-\lambda_1)L_{\text{RC}}\end{array}$

The value $\begin{array}{l}\lambda_1\end{array}$ varies between 0.7 and 0.9, according to the value of the margin.

Results and Discussion

Experimental Setup

The chosen batch size was 128 while all models were trained with momentum optimizer and decaying learning rate and momentum that were initialized at 0.001 and 0.7 respectively. The benchmark datasets used for the evaluation are MNIST [13] and Fashion MNIST [14]. The networks are all trained on clean samples and their robustness is tested with black-box attacks. We are presenting the average, the minimum and the maximum of the computed accuracy of all the attack methods utilized that consist of the following:

- FGSM with $\begin{array}{l}\epsilon=0.1\end{array}$ and $\begin{array}{l}\epsilon=0.3\end{array}$ [15]
- DeepFool [16]
- Iterative FGSM [2]
- Saliency Map Attack [17]
- LBFGS [1]
- Addition of Gaussian Noise
- Addition of Uniform Noise

Contribution of the margin term

In Table 1 we present the results that showcase the contribution of the margin term in the robustness against adversarial examples.

Solely the addition of the margin improves the results in both the datasets by 5% for MNIST and 7% for Fashion MNIST in the average case, 1-2% in the best case and 10-11% in the worst case. That means that the model became significantly more robust to adversarial attacks of varying strength.

Table 1: Comparison between normal Softmax and large margin variations

Contributions of the robust center loss

In Table 2 the contribution of the center loss and the proposed robust center loss can be observed. The combinations of the proposed loss functions increases the robustness of our model by 7-8% in the average case for both datasets, 1-2% in the best case and 10-11% in the worst case.

Table 2: Comparison between combinations of Softmax, center loss and robust center loss

Detailed results against attacks

In Table 3 we can see the detailed results for each attack method, as well as the clean test samples. The improvement regarding the adversarial attacks goes up to 11% but in the noisy scenario there is a 22% and 24% improvement for Gaussian and Uniform noise respectively. That is an indication that both the maximized boundaries and the robust center loss made sure the noisy data were not considered noisy anymore.

Table 3: MNIST: Detailed results against all the attacks

In Table 4 the same observations hold. The robustness against adversarial attacks is improved by an impressive 17% for FGSM with $\begin{array}{l}\epsilon=0.1\end{array}$ , while for the noisy data the same 21-22% improvement is achieved highlighting again the contribution of the margin and robust center loss.

Table 4: Fashion MNIST: Detailed results against all the attacks

ROC Curves

In Fig. 2 the receiver operating characteristic curves (ROC) of the baseline and the proposed model for both datasets are shown. The observations are consistent. The clean data are perfectly classified in both cases, which indicates that the performance of the algorithm in the natural scenario has been maintained. However, in the case of the Uniform noise data the curve shows that the proposed model classifies the noisy data as good as the clean. Regarding the adversarial attack shown, which is FGSM with $\begin{array}{l}\epsilon=0.3\end{array}$ the improvement in the performance with the proposed model is also visible.

Figure 2: Fashion MNIST. Left: ROC for baseline model with original Softmax.

Right: Proposed model with margin = 2 and robust center loss. Blue: Clean data, Pink: Uniform Noise, Orange: FGSM 0.3

Conclusion

In this paper we investigated the security of CNNs under adversarial settings. Specifically we targeted the boundaries of the classification and utilized modern methods to optimize them and increase their robustness. The introduction of the large margin in the Softmax ensured the boundaries' distance between different classes was maximized. The center loss not only made the classes more compact, but also decreased the number of potential adversarial directions, which subsequently led to lower rate of adversarial crafting.
Our experiments showed significant improvement in the robustness of the models, not only against attacks, but also against noise. As a conclusion it is important to utilize strategies that improve the classification boundaries, because that ensures not only higher robustness in the model, but also significant tolerance against noise.

References

[1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.

[2] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. CoRR, abs/1611.01236, 2016. URL http://arxiv.org/abs/1611.01236.

[3] Mahdieh Abbasi and Christian Gagn´e. Robustness to adversarial examples through an ensemble of specialists. CoRR, abs/1702.06856, 2017. URL http://arxiv.org/abs/1702.06856.??

[4] Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. CoRR, abs/1511.04508, 2015b. URL http://arxiv.org/abs/1511.04508.

[5] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. CoRR, abs/1702.04267, 2017. URL http://arxiv.org/abs/1702.04267.

[6] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. CoRR, abs/1412.5068, 2014. URL http://arxiv.org/abs/1412.5068.

[7] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016. URL http://arxiv.org/abs/1602.02697.

[8] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. CoRR, abs/1611.02770, 2016b. URL http://arxiv.org/abs/1611.02770.

[9] Florian Tramer, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. CoRR, abs/1704.03453v2, 2017.

[10] Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. Large-margin softmax loss for convolutional neural networks. In Proceedings of The 33rd International Conference on Machine Learning, pp. 507–516, 2016a.

[11] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach
for deep face recognition. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII, pp. 499–515, 2016.

[12] Peter J. Huber. Robust Statistics. John Wiley & Sons, 2005.

[13] Yann Lecun and Corinna Cortes. The MNIST database of handwritten digits. 1998. URL http://yann.lecun.com/exdb/mnist/.

[14] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.

[15] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.

[16] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015. URL http://arxiv.org/abs/1511.04599.

[17] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015a. URL http://arxiv.org/abs/1511.07528.

Seitenhierarchie