Do Adversarially Robust ImageNet Models Transfer Better?

This is the blogpost written by John Zielke for the paper 'Do Adversarially Robust ImageNet Models Transfer Better?' from authors Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor and Aleksander Madry.

This paper was presented at NeurIPS 2020.

Introduction

One major hurdle for the training of neural networks is that the models need to be trained with a sufficiently large training dataset in order to prevent overfitting and to achieve a good generalizability. The creation of these datasets can be a challenge, especially in the case of medical applications, as the need for well-trained (and well-payed) radiologists makes labeling big datasets prohibitively expensive, and the high data privacy standards in the medical field makes access to the data very difficult sometimes. With transfer learning we can make use of knowledge learned from a preexisting (public) dataset when training on a new (often smaller) dataset in order to achieve better network performance [1]. The paper "Do Adversarially Robust ImageNet Models Transfer Better?" investigates whether using adversarially robust networks instead of "standard" networks improves the performance of transfer learning.

How Does Transfer Learning Work?

Assuming a network architecture similar to Resnet, an example of Transfer Learning for a image classification task is shown in fig. 1. The network is first trained on a prexisting, large dataset. The learned weights from the convolutional layers are then used as an initialization for the training for the target task with a smaller dataset. As the fully connected layers are specific to each task, mapping the feature representation R(x) (the output of the convolutional layers) to the output classes, they need to be trained from scratch and can't be reused.

Figure 1: Application example of Transfer Learning

What is Adversarial Robustness?

Adversarial Robustness has first come up in AI security [2, 3, 4]. It refers to the ability of a neural network to have a stable and correct output, even if the input image is minimally changed in a way that is specifically designed to fool the network into predicting something else. This small change is called a perturbation. fig. 2 provides an example of this concept: The input image of a dog (on the left) is (for humans) imperceptibly changed to the image on the right. This change causes the network to now predict a completely different class with a probability of over 99%.

Figure 2: Example of a dog image as an adversarial input [5]

Adversarial examples can not only be digitally altered images, but can also be specifically engineered real world objects. This was previously demonstrated using a 3D-printed turtle that was classified as a rifle by the "attacked" network [6] (see fig. 3).

Figure 3: Adversarial Input as an 3D object: 3D-printed turtle designed to be classified as a rifle [6]

Methodology

To compare the transfer performance of the adversarially robust networks to standard networks, the authors trained networks from scratch both as adversarially robust and without adversarial robustness on the ImageNet dataset, and then used these to do transfer learning on different datasets.

Adversarial Robustness as a Prior for Learned Representations

The authors assume that by using adversarial robustness as a prior for the network, the learned features will be more meaningful and therefore transfer better. They base this intuition on some findings from [7]. One of the findings is that the feature representation of two (to humans) completely different pictures can have very similar representations in the feature space when using standard networks.

By training for adversarial robustness, the images need to be a lot more visually similar to have a similar representation in the feature space. This is visualized by fig. 4, where the authors tried to change a source image so that its representation in feature space is similar to that of a target image, i.e. $\begin{array}{l}R(x_1') \approx R(x_2)\end{array}$ . This is done by optimizing (using backpropagation on the feature represenation) a pertubation (starting with noise) that is added to the source image until the representation is sufficiently similar to the one of the target image. While the source image x₁ stays discernible when using a standard network, the result when using a robust network is visually very similar to the target image, with the source image being unrecognizable.

Figure 4: Comparison of the corresponding input images for robust and standard networks when adapting an image to have feature representation similar to a target image [7]

Robust networks also allow for direct feature visualization [7]. By optimizing an input image to maximize one of the coordinates of its feature representation, we can directly visualize what activates this feature. While this works really well for robust networks, it does not produce any meaningful, as shown in fig. 5. This further corroborates the advantages of robust features and their meaningfulness.

Figure 5: Feature visualization on robust and standard networks [7]

How do you Train Adversarially Robust Networks?

In order to train adversarially robust networks the authors used a method described in [8]. In very simple terms, the training is done by adding some perturbation to each input sample before each training iteration. To determine the best perturbation to use (the one that maximizes the loss of the network), you have to solve a maximization problem for the perturbation $\begin{array}{l}\delta\end{array}$ :

$\begin{array}{l}\displaystyle \Big[\max\limits_{|\delta|_2 \le \epsilon}\mathcal{L}(x + \delta, y,\Theta)]\end{array}$

In order for the perturbation to be unnoticeable, the change can't be arbitrarily large, therefore the norm of delta must be smaller than a hyperparameter epsilon. In order to find the best $\begin{array}{l}\delta\end{array}$ for each sample and iteration, backpropagation can be used to adjust $\begin{array}{l}\delta\end{array}$ along the gradient with respect to $\begin{array}{l}\delta\end{array}$ , maximizing the loss of the network. The training step must then try to minimize this loss again by adjusting the weights of the network accordingly, not different to standard training.

Choosing the hyperparameters

In order to produce comparable results, the hyperparameters that the standard and adversarially robust networks have in common were kept constant and equal for all experiments. The best value for the hyperparameter epsilon was determined beforehand using grid search.

Experiments and Results

The transfer performance of robust networks compared to standard networks was evaluated on several well-known network architectures and downstream tasks that both include image classification and segmentation. The network architectures were all variants of the ResNet architecture, specifically: ResNet-18, ResNet-50, WideResNet-50-2, and WideResNet-50-4. The datasets used as classification downstream tasks are shown in fig. 6, the one used for image segmentation is Microsoft COCO and for object detection the PASCAL Visual Object Classes (VOC), and Microsoft COCO datasets were used.

Figure 6: Datasets used in the transfer performance comparison [9]

Fixed-Feature Transfer

As a first step, the authors evaluated the transfer performance when only the fully connected layers at the end were retrained, whereas the weights of the convolutional layers (often called the features) were kept constant. This decreases training time, but often leads to worse results than fine-tuning the whole network [10, 11]. As can be seen in fig. 7, the robust networks outperformed the standard networks consistently, and in some cases like CIFAR-10 and CIFAR-10 very significantly.

Figure 7: Performance Comparison for fixed-feature transfer between standard and robust networks in image classification tasks [9]

Transfer With Full-Network Fine-Tuning

In the next step, the whole network with all its parameters was fine-tuned, with the following accuracies being achieved.

Figure 8: Performance comparison for full-network fine-tuning transfer between standard and robust networks in image classification tasks [9]

Comparable to the fixed-feature training, the robust networks consistently outperformed the standard networks, while the overall performance of the networks was improved for both standard and robust networks (see fig. 8). We can also see that the performance gap on the CIFAR-10 and CIFAR-100 datasets has narrowed. This could mean that the features in robust networks are more meaningful and generalizable, which is why the performance was so good even when the features couldn't be fine-tuned.

Instance Segmentation

Transfer performance was not only evaluated for classification tasks, but also for segmentation and object detection tasks. In these tasks the best robust models (with the best epsilon parameter from the grid search) again outperformed the standard model, as can be seen in fig. 9:

Figure 9: Transfer Performance Comparison between standard and robust networks for image segmentation and detection tasks (both bounding box and mask) [9]

Correlation between ImageNet Accuracy and Transfer Performance

The authors also note that the transfer performance of robust networks stronger correlates to the accuracy on the source dataset (ImageNet) than it does for standard networks, as can be seen in fig. 10:

Figure 10: Comparison of transfer performance and accuracy on the source task for both standard and robust networks [9]

This means that the performance of a network trained using this technique could be further improved by using other methods for improving training accuracy. We can also see that eventhough accuracy on the source task may be worse than that of the corresponding standard network, the transfer performance can still be better.

Conclusion

The authors show that in their experiments robust networks matched or outperformed standard networks in transfer performance on different tasks, and therefore propose using robust networks for transfer learning. They also investigate the connection between the accuracy on the source task, the robustness, and the transfer performance.

Student's Review

In the paper the authors convincingly show why using robust models during transfer learning can give an edge over standard networks. They look at a wide range of popular datasets as the target task, therefore proving their results to be plausible and not a result of a lucky (or concious) preselection of tasks. They also use a state-of-the-art and well-known network architecture in their experiments, which I already used in my projects as well, resulting in a great applicability of their results to many projects. The paper is well-written and contains extensive additional information in the appendix, helping both the understanding of the utilized concepts as well as increasing reproducibility. In addition to the appendix, they also provide a code repository with the (open-source) code and models used in their experiments. This enables others to not only test the methods proposed by the paper, but also to improve upon them or to even use them in real-world applications. What I am missing would be some information regarding the necessary hardware and computing time for pretraining of the robust and the standard models, as often (especially in small research projects) computing power is limited and might therefore be a factor when selecting the methods for training a network. In addition, the performance advantages are minimal in some cases (especially when doing full-network fine-tuning). These two points are of course only of importance if the pretraining still needs to be done, as the pretrained models provided by the authors do not need more effort to use than standard pretrained models. The authors also don't provide a lot of theoretical explanation of their findings. All in all, I think this a very nice paper that was well prepared and will improve the process of training deep neural networks in the future.

References

[1] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” Advances in Neural Information Processing Systems 27, pages 3320-3328. Dec. 2014, Nov. 2014.
[2] B. Biggio et al., “Evasion Attacks against Machine Learning at Test Time,” ECML PKDD, Part III, vol. 8190, LNCS, pp. 387–402. Springer, 2013, Aug. 2017, doi: 10.1007/978-3-642-40994-3_25.
[3] B. Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Dec. 2017, doi: 10.1016/j.patcog.2018.07.023.
[4] N. Carlini and D. Wagner, “Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods,” May 2017.
[5] S. H. Silva and P. Najafirad, “Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey,” Jul. 2020.
[6] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing Robust Adversarial Examples,” Jul. 2017.
[7] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, B. Tran, and A. Madry, “Adversarial Robustness as a Prior for Learned Representations,” Jun. 2019.
[8] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks,” Jun. 2017.
[9] H. Salman, A. Ilyas, L. Engstrom, A. Kapoor, and A. Madry, “Do Adversarially Robust ImageNet Models Transfer Better?,” Jul. 2020.
[10] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the Devil in the Details: Delving Deep into Convolutional Nets,” May 2014.
[11] P. Agrawal, R. Girshick, and J. Malik, “Analyzing the Performance of Multilayer Neural Networks for Object Recognition,” Jul. 2014.

Seitenhierarchie