Trends in Data Augmentation

Introduction

The scarcity of datasets often limits Deep Learning (DL) applications. In the medical field especially, it is usually hard to access public medical records and images due to privacy and ethical reasons. With smaller datasets, the risk of overfitting and lack of generalization is more considerable. Data augmentation can be used to overcome these issues. [27]

Definition

Data augmentation consists of generating similar examples of the training data which are still different and represent a variation [9]. It is achieved by applying transformations to images while preserving their labels. Data augmentation is an approach against the overfitting of neural networks (NNs) and to tackle dataset scarcity and/or homogeneity problems by artificially enlarging the dataset [1]. Data augmentation is used especially in visual object classification [7] and is recommended to improve the performance of NNs [8].

Categorization

The following classification of data augmentation methods was based on Chlap's categorization in his review of medical image data augmentation techniques for DL applications from last year [27] as well as on Shorten's survey [29]. The basic traditional methods are shortly described and illustrated, leading the way to the trends and novel approaches (marked in red) which are explored in more detail. Furthermore, the focus here is mainly on medical applications.

From Basic Transformations to Deep Autoaugment

Basic image augmentation methods include especially geometric transformations, cropping, intensity operations, and filters. Furthermore, different augmentation techniques have been proposed which can be considered basic, such as occlusion [11], combination, and mix-up of pairs of images and their labels in a convex manner [4] as well as object insertion.

1. Geometric transformations and cropping

Geometric transformations include horizontal reflections, rotations, translations, shear, scaling, and random cropping. The parameters of these transformations can be either chosen manually or randomly sampled [1,4, 27]

2. Intensity operations and filtering

Similar to transforming the images' geometry, colors can also be manipulated in the augmentation process, this includes possible changes in brightness, contrast, saturation, and hue.

Advantages

Despite the fact that these transformations are considered basic, they are still widely used since they yield numerous advantages such as:

Speed [14]
Reproducibility [14]
Reliability [14]
Easy implantability [14]
More effective for training GANs in image segmentation [26]

A disadvantage of these transformations is that their output is limited to the used dataset.

3. Autoaugment

Autoaugment automatizes the process of selecting and combining the above-mentioned basic transformations [30]. It tackles the problem of finding the best augmentation policy as a discrete search problem.

Many variations of Autoaugment have been explored in the last years trying to improve its performance, especially speed and computational cost.

4. Deep Autoaugment

Deep Autoaugment [31] fully automates the data augmentation process. It builds upon Autoaugment but without having any hand-picked transformations. For this purpose, it applies gradient matching to reduce the computational cost.

From Occlusion and Combination to Transparency

Another subcategory of popular basic transformations includes occlusion and combination.

1. Cutout

This augmentation method consists in randomly masking out square regions of input during training. [11] [16]

Advantages

Improves object localization [17][18]
Enhances the generalization performance of convolutional neural networks (CNNs) [11] [16]
Improves clean accuracy [11]

Drawbacks

Does not improve robustness [12]

2. MixUp

This augmentation approach takes the input of two different images and mixes them to give back the output of an interpolation of the images and the labels. [5]

It produces virtual feature-target vectors $\begin{array}{l}(x = λx_i + (1 −λ)x_j, y = λy_i + (1 −λ)y_j)\end{array}$ , starting from $\begin{array}{l}(x_i, y_i)\end{array}$ and $\begin{array}{l}(x_j, y_j)\end{array}$ which are two feature-target vectors from the training data, and $\begin{array}{l}λ \in [0, 1]\end{array}$ . [4]

Advantages:

Easy implementation (few lines of code) [4]
Minimal computational overhead [4]
Increases the robustness of NNs when learning from corrupt labels or facing adversarial examples [4]
Stabilizes GANs training [4]
Improves generalization performance for image segmentation tasks [35]

Drawbacks:

Nonnatural-looking produced samples [5]
The class activation map shows that the model is not focusing on the correct areas when choosing cues for recognition. [5]

3. CutMix

For this approach, the input consists also of two images. A patch is cut from the first image and pasted on the second one. Proportionally to the area of the patches, the labels are also mixed [5]

The input is observed as $\begin{array}{l}x ∈ R^{W×H×C}\end{array}$ for the training image and $\begin{array}{l}y\end{array}$ for its label, respectively. CutMix generates a sample $\begin{array}{l}(\tilde x, \tilde y)\end{array}$ by combining two training samples $\begin{array}{l}(x_A, y_A)\end{array}$ and $\begin{array}{l}(x_B, y_B)\end{array}$ . The generated training sample $\begin{array}{l}(\tilde x, \tilde y)\end{array}$ is used to train the model. The combining operation is defined as:

$\begin{array}{l}\displaystyle \tilde x =M \odot x_A + (1−M) \odot x_B \newline \tilde y = λy_A + (1 −λ)y_B,\end{array}$

where $\begin{array}{l}M ∈ \{0, 1\}^{W×H}\end{array}$ is a binary mask indicating where the cutting and pasting takes place. [5]

Advantages

Advantages of regional dropout [5]
Training efficiency increase by not having any uninformative pixels [5]
Localization ability forces the model to recognize objects while only partially perceiving them [5]
No need for extra computational resources [5]
Classification accuracy improvement [5]
Weakly-supervised object localization performance improvement [5]

4. Transparency algorithm

The main idea of the transparency algorithm is to use the Region of Interest information to preserve the pixels at the lesion and reduce the background pixel values of the bounding box. The generated image focuses on lesion areas without losing global image context by blurring the original image except for lesion areas. The new image would still have the same distribution as the original one and a deep focus on lesions.

A new training sample $\begin{array}{l}(x', y')\end{array}$ is created by transforming pixel values from the original sample $\begin{array}{l}(x,y)\end{array}$ .The transformation operation can be described as:

$\begin{array}{l}\displaystyle x' = M \odot x \newline y' = y\end{array}$

The operator preserves the original label. The mask is created as follows:

$\begin{array}{l}M_{ij} = \begin{cases}1 ; if (x_{min}≤i≤x_{max})\&(y_{min}≤j≤y_{max}) \\ α ; if (x_{min}>i | i>x_{max} | y_{min}>j | j>y_{max}) \end{cases}\end{array}$

Where x is the image and $\begin{array}{l}M ∈ \{α;1\}^{W×H}\end{array}$ is the applied mask with α as a random number that ranges from 0.0 to 0.4, and 1 is the value of pixels that is inside the lesion bounding box. $\begin{array}{l}Box = (x_{min},x_{max},y_{min},y_{max})\end{array}$ indicates the bounding box location for the abnormal image. [25]

Advantages

Performs better than or as good as CutMix in mammograms classification [25]
Easily applicable to any medical dataset whose images have lesions [25]

From Object Insertion to SwapMix

1. Cut-paste-learn

The idea behind the Cut-Paste-Learn approach is to insert object instances into background scenes in order to improve the object detection performance of the model. This is yet done randomly and without awareness of the visual context of the background where the objects are inserted. [20]

Advantages

Improves object detection [20]

Drawbacks

Risks overfitting blending artifacts without blending strategies and distractors [19]
Could hurt object detector accuracy when context-free [19]
Failure to recognize certain views [19]

2. Object insertion with visual context

This approach is aware of the visual context and does not input the images randomly. As input, besides the instances to insert, contextual images containing a masked bounding box are passed to the model. These boxes are generated in the first step to prepare the local context samples. In the next step, A CNN is applied to predict the presence of each object in these masked bounding boxes. The context CNN matches the object instances to the boxes which give the highest confidence scores. Finally, two instances at most are rescaled and blended into the selected bounding boxes. The resulting images are then used for training.

3. Swapmix

The principle of SwapMix is to perturb the visual context by changing features of objects judged contextually irrelevant with features from other objects in the dataset.

It proceeds as follows:

Identification of the irrelevant objects' visual features.
Swapping them with features of another similar object from the dataset.
The swap can be either a class label or an attribute.
Control of the swapped objects ensuring object-scene compatibility
Confusion of the model which does not correctly recognize the feature. [3]

There are two approaches when it comes to context swapping:

- Class label swap: swap the object feature with the feature of an object from a different class. Like putting an object of a different class in place of the irrelevant context object in the image. For example, changing a bus into a car. It proceeds as follows:
  - for each context object, the k nearest classes to its class are found by computing similarities of class names and picking the top k class labels
  - set a threshold to exclude classes with small similarities
  - pick a random object of the swapped class from the dataset and use its attributes to generate one-hot encodings for the swapped object attributes [3]
- Attribute swap: swap the object feature with the feature of an object from the same class but with different attributes. This means that the object attributes are changed but the object itself is preserved. For example, changing a red bus with a yellow bus. [3]

Advantages:

Figure 14: the effect of SwapMix on attention weights and its impression of the models' reliance on visual context. An example about the attention weights for models trained without SwapMix and models trained using SwapMix as data augmentation. In this example, a question about the color of the camera was given. It can be observed that without Swapmax, the model pays more attention to irrelevant context objects such as the car, the tree, etc., while the model trained with SwapMix augmentation focuses highly on the relevant object, camera, and pays very little attention to other objects. [3]

Increase in model robustness and effective accuracy [3]
Suppression of the model’s dependency on visual context and more focus on relevant objects [3]
Decrease of context reliance [3]

Drawbacks:

Training time increases by a factor of 1.4 due to context swapping for every image during data loading [3]

From Adversarial Training to Fast Adversarial Propagation

1. Adversarial training

Adversarial examples are created by adding indiscernible perturbations to images. Adversarial examples can make NNs, especially CNNs, make wrong predictions and test the limits of their generalization therefore training with them improves robustness and increases accuracy. [1]

As a reference, vanilla training relies on minimizing a certain loss and can be expressed as: $\begin{array}{l}\underset {\theta} {\arg \min} E_{(x,y)∼D}[ L(θ, x, y)]\end{array}$ . In the adversarial training based on Madry’s adversarial training framework [21], a small perturbation is added to the original samples and then trained with it which can be expressed as: $\begin{array}{l}\underset {\theta} {\arg \min} [ E_{(x,y)∼D}( \max L(\theta, x + \epsilon,y))]\end{array}$ [1]

Advantages

More realistic network representations since perturbation as in real situations and perceptions are included [22]
More robustness against noise especially high-frequency like Gaussian noise [23]
Less sensitivity to texture distortions and more focus on shape information. [24]

Drawbacks

Training improvement only on small datasets in the fully-supervised setting [37,38], or on larger datasets but in the semi-supervised setting [39, 40].
Performance degradation and accuracy decrease on clean images in large datasets with supervised learning [41,42,43,44,45] which can get below vanilla training.

2. Adversarial propagation

The main difference between adversarial propagation and adversarial training is the utilization of an additional set of batch normalization (BN) with different rescaling parameters used only for the adversarial images since adversarial samples have a different statistical distribution than the clean ones. This results in two distinct separate batch normalization layers to bridge the underlying distribution mismatch achieving an accurate statistics estimation and a better feature absorption from both the clean and the adversarial samples. Therefore applying an auxiliary batch normalization on the adversarial samples can improve learning and model performance. [1,2,46]

Adversarial propagation is about treating adversarial images as additional training samples and training networks with a mixture of adversarial examples and clean images which can translate into: $\begin{array}{l}\underset {\theta} {\arg \min} [ E_{(x,y)∼D}( L(θ, x, y)+\underset{\epsilon \in \mathbb{S}} \max L(\theta, x + \epsilon,y))]\end{array}$ [1]

Procedure

Randomly sample a subset of the training data
AdvProp applies adversarial attacks to the clean images and generates the corresponding adversarial images, using the auxiliary BN layers.
The clean images and their adversarial counterparts are used for the training of the network as a pair. Specifically, the original BN layers are applied exclusively on the clean images, and the auxiliary BN layers are applied exclusively on the adversarial images.
Jointly optimize the loss from the adversarial images and clean images and update the network parameters. [2]

Advantages

Increases accuracy without additional data.
Compared to vanilla training, adversarial training always degrades model performance while AdvProp improves accuracy on all ResNet models [1]

Weaknesses

The attacks should be dependent on the network, meaning to apply weak attacks for weak networks and strong attacks for large networks because weaker attackers push the distribution of adversarial examples less away from the distribution of clean images, and larger networks are better at bridging domain differences. [1]
More computational cost as it requires 7× more forward and backward passes than the vanilla baseline. [2]

III. Fast adversarial propagation

Fast AdvProp differs from AdvProp by only using a small portion of the sampled batch to generate adversarial examples instead of using the whole batch as in AdvProp. Moreover, during the generation of adversarial images, the gradient calculation of input images and the gradient calculation of network parameters are merged into the same forward and backward pass, which eliminates the cost of generating adversarial samples [2]

Despite using a small portion of the sampled batch to generate adversarial samples, and having a reduced total number of adversarial training examples besides a high number of clean images, empirical results have shown that it is sufficient to gain robust feature representations. [2]

To ensure the praised training cost reduction of this approach and match it with the Vanilla approach, three measurements have to be taken in this approach:

Simplify the projected gradient descent attacker [2]
Break the pairing between the clean images and adversarial images [2]
Reduce the number of training epochs [2]

To stabilize the network training, other measurements have to be taken:

Use batch statistics instead of running statistics to generate adversarial examples. [2]
Shuffle batch normalization across multiple GPUs to avoid intra-batch information exchange [2]
Divide in half the importance of the images used for adversarial attack because clean images have one forward and backward pass while distorted images have two and since each image in the batch should have the same importance and the overall importance should remain the same. [2]
synchronize the parameter update speed by re-scaling the gradient to avoid the inconsistent updating speed of network parameters since the parameters of original batch normalization only receive the gradients from clean images, the parameters of auxiliary batch normalization only receive the gradients from images with random noise and adversarial examples, while the parameters of the shared layers receive gradients from all examples. [2]

Advantages

Same training cost as the vanilla training baseline with better performance [2]

Self-Augmentation Mechanism (SAM)

The self-Augmentation mechanism is mainly about augmenting the original images with the output features of a CNN and it works as follows:

Extraction of high-level features at the pooling layer of the CNN
Feature selection through the augmentation mechanism and generation of low-dimensional augmented features
Conversion into a new lower-dimension representation by RICA (Reconstruction Independent Component Analysis)
Information extraction from the generated features for correct classification of target samples [28]

Discriminative information about classes is in the representations learned by the augmentation mechanism and this enables accurate prediction from the network.

Review

Even though basic transformations have been proven to achieve good results in medical applications, care should be taken when applying them so as not to flip certain transverse images since organs are always on a certain side of the body or not applying high rotation angles as such an output is not possible in reality.

Each augmentation technique is suitable for a specific task and use case. Swapmix might be valuable for VQA model but would probably be less useful in a segmentation model for example. The applications of VQA in the medical field have a lot of potential but seem to be at their beginning. For Swapmix, only two representative models were studied, using two types of visual features on the GQA dataset [3]. This can limit the generalization.

Even though data augmentation consistently leads to improved generalization [9], it still has its limitations. In fact, it depends on the dataset, requires the use of expert knowledge, and assumes that the examples in the vicinity share the same class, and does not model the vicinity relation across examples of different classes. [4]

References

[1] Xie, Cihang; Tan, Mingxing; Gong, Boqing; Wang, Jiang; Yuille, Alan L.; Le, Quoc V. (2020): Adversarial Examples Improve Image Recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, 13.06.2020 - 19.06.2020: IEEE, S. 816–825.

[2] Mei, Jieru; Han, Yucheng; Bai, Yutong; Zhang, Yixiao; Li, Yingwei; Li, Xianhang et al.: Fast AdvProp. DOI: 10.48550/arXiv.2204.09838.

[3] Gupta, Vipul; Li, Zhuowan; Kortylewski, Adam; Zhang, Chenyu; Li, Yingwei; Yuille, Alan: SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering. DOI: 10.48550/arXiv.2204.02285.

[4] Zhang, Hongyi; Cisse, Moustapha; Dauphin, Yann N.; Lopez-Paz, David (2018): mixup: Beyond Empirical Risk Minimization. In: International Conference on Learning Representations. Online verfügbar unter https://openreview.net/forum?id=r1Ddp1-Rb.

[5] Yun, Sangdoo; Han, Dongyoon; Oh, Seong Joon; Chun, Sanghyuk; Choe, Junsuk; Yoo, Youngjoon (2019): CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. Online verfügbar unter https://arxiv.org/pdf/1905.04899.

[6] Nalepa, Jakub; Marcinkiewicz, Michal; Kawulok, Michal (2019): Data Augmentation for Brain-Tumor Segmentation: A Review. In: Frontiers in computational neuroscience 13, S. 83. DOI: 10.3389/fncom.2019.00083.

[7] Cireşan, Dan; Meier, Ueli; Schmidhuber, Juergen: Multi-column Deep Neural Networks for Image Classification. In: CVPR 2012. Online verfügbar unter http://arxiv.org/pdf/1202.2745v1.

[8] Simard, P. Y.; Steinkraus, D.; Platt, J. C.: Best practices for convolutional neural networks applied to visual document analysis, S. 958–963. DOI: 10.1109/ICDAR.2003.1227801.

[9] Simard, Patrice Y.; LeCun, Yann A.; Denker, John S.; Victorri, Bernard (2012): Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent Propagation. In: Grégoire Montavon, Geneviève B. Orr und Klaus-Robert Müller (Hg.): Neural networks: tricks of the trade, Bd. 7700. 2. ed. Berlin, Heidelberg: Springer (Lecture Notes in Computer Science, 7700), S. 235–269.

[10] Chapelle, Olivier; Weston, Jason; Bottou, Léon; Vapnik, Vladimir (2000): Vicinal Risk Minimization. In: T. Leen, T. Dietterich und V. Tresp (Hg.): Advances in Neural Information Processing Systems, Bd. 13: MIT Press. Online verfügbar unter https://proceedings.neurips.cc/paper/2000/file/ba9a56ce0a9bfa26e8ed9e10b2cc8f46-Paper.pdf.

[11] DeVries, Terrance; Taylor, Graham W. (2017): Improved Regularization of Convolutional Neural Networks with Cutout. Online verfügbar unter https://arxiv.org/pdf/1708.04552.

[12] Lopes, Raphael Gontijo; Yin, Dong; Poole, Ben; Gilmer, Justin; Cubuk, Ekin D. (2019): Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation. Online verfügbar unter https://arxiv.org/pdf/1906.02611.

[13] Mikolajczyk, Agnieszka; Grochowski, Michal (2018): Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW). 9-12 May 2018. 2018 International Interdisciplinary PhD Workshop (IIPhDW). Swinoujście, 5/9/2018 - 5/12/2018. Piscataway, NJ: IEEE, S. 117–122.

[14] Perez, Luis; Wang, Jason (2017): The Effectiveness of Data Augmentation in Image Classification using Deep Learning. Online verfügbar unter http://arxiv.org/pdf/1712.04621v1.

[16] Zhong, Zhun; Zheng, Liang; Kang, Guoliang; Li, Shaozi; Yang, Yi (2017): Random Erasing Data Augmentation. Online verfügbar unter https://arxiv.org/pdf/1708.04896.

[17] Singh, Krishna Kumar; Lee, Yong Jae (2017): Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization. Online verfügbar unter https://arxiv.org/pdf/1704.04232.

[18] Choe, Junsuk; Shim, Hyunjung (2019): Attention-based Dropout Layer for Weakly Supervised Object Localization. Online verfügbar unter https://arxiv.org/pdf/1908.10028.

[19] Dvornik, Nikita; Mairal, Julien; Schmid, Cordelia: Modeling Visual Context is Key to Augmenting Object Detection Datasets. In: ECCV2018. Online verfügbar unter https://arxiv.org/pdf/1807.07428.

[20] Dwibedi, Debidatta; Misra, Ishan; Hebert, Martial (2017): Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection. In: 2017 IEEE International Conference on Computer Vision (ICCV). 2017 IEEE International Conference on Computer Vision (ICCV). Venice, 22.10.2017 - 29.10.2017: IEEE, S. 1310–1319.

[21] Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (2017): Towards Deep Learning Models Resistant to Adversarial Attacks. Online verfügbar unter http://arxiv.org/pdf/1706.06083v4.

[22] Tsipras, Dimitris; Santurkar, Shibani; Engstrom, Logan; Turner, Alexander; Madry, Aleksander (2018): Robustness May Be at Odds with Accuracy. Online verfügbar unter https://arxiv.org/pdf/1805.12152.

[23] Yin, Dong; Lopes, Raphael Gontijo; Shlens, Jonathon; Cubuk, Ekin D.; Gilmer, Justin (2019): A Fourier Perspective on Model Robustness in Computer Vision. Online verfügbar unter https://arxiv.org/pdf/1906.08988.

[24] Zhang, Tianyuan; Zhu, Zhanxing (2019): Interpreting Adversarially Trained Convolutional Neural Networks. Online verfügbar unter https://arxiv.org/pdf/1905.09797.

[25] Tran, Sam B.; Nguyen, Huyen T. X.; Pham, Hieu H.; Nguyen, Ha Q. (2022): Transparency strategy-based data augmentation for BI-RADS classification of mammograms. Online verfügbar unter https://arxiv.org/pdf/2203.10609.

[26] Skandarani, Youssef; Jodoin, Pierre-Marc; Lalande, Alain (2021): GANs for Medical Image Synthesis: An Empirical Study. Online verfügbar unter https://arxiv.org/pdf/2105.05318.

[27] Chlap, Phillip; Min, Hang; Vandenberg, Nym; Dowling, Jason; Holloway, Lois; Haworth, Annette (2021): A review of medical image data augmentation techniques for deep learning applications. In: Journal of medical imaging and radiation oncology 65 (5), S. 545–563. DOI: 10.1111/1754-9485.13261.

[28] Muhammad, Usman; Hoque, Md Ziaul; Oussalah, Mourad; Keskinarkaus, Anja; Seppänen, Tapio; Sarder, Pinaki (2022): SAM: Self-augmentation mechanism for COVID-19 detection using chest X-ray images. In: Knowledge-based systems 241, S. 108207. DOI: 10.1016/j.knosys.2022.108207.

[29] Shorten, Connor; Khoshgoftaar, Taghi M. (2019): A survey on Image Data Augmentation for Deep Learning. In: J Big Data 6 (1). DOI: 10.1186/s40537-019-0197-0.

[30] Cubuk, Ekin D.; Zoph, Barret; Mane, Dandelion; Vasudevan, Vijay; Le V, Quoc (2018): AutoAugment: Learning Augmentation Policies from Data. Online verfügbar unter https://arxiv.org/pdf/1805.09501.

[31] Zheng, Yu; Zhang, Zhi; Yan, Shen; Zhang, Mi (2022): Deep AutoAugment. Online verfügbar unter https://arxiv.org/pdf/2203.06172.

[32] Zhang, Yan; Liu, Xi; Wa, Shiyun; Liu, Yutong; Kang, Jiali; Lv, Chunli (2021): GenU-Net++: An Automatic Intracranial Brain Tumors Segmentation Algorithm on 3D Image Series with High Performance. In: Symmetry 13 (12), S. 2395. DOI: 10.3390/sym13122395.

[33] Krogue, Justin; Cheng, Kaiyang; Hwang, Kevin; Toogood, Paul; Meinberg, Eric; Geiger, Erik et al. (2019): Automatic hip fracture identification and functional subclassification with deep learning. arXiv:submit/2819095

[34] https://medium.com/@wolframalphav1.0/easy-way-to-improve-image-classifier-performance-part-1-mixup-augmentation-with-codes-33288db92de5

[35] Isaksson, Lars J.; Summers, Paul; Raimondi, Sara; Gandini, Sara; Bhalerao, Abhir; Marvaso, Giulia et al. (2022): Mixup (Sample Pairing) Can Improve the Performance of Deep Segmentation Networks. In: Journal of Artificial Intelligence and Soft Computing Research 12 (1), S. 29–39. DOI: 10.2478/jaiscr-2022-0003

[36] Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709, 2019.

[37] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.

[38] Yan Li, Ethan X Fang, Huan Xu, and Tuo Zhao. Inductive bias of gradient descent based adversarial training on separable data. arXiv preprint arXiv:1906.02931, 2019

[39] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. TPAMI, 2018

[40] Siyuan Qiao, Wei Shen, Zhishuai Zhang, BoWang, and Alan Yuille. Deep co-training for semi-supervised image recognition. In ECCV, 2018

[41] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.,

[42] Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032, 2019,

[43] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In ICLR, 2017,

[44] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.,

[45] Cihang Xie, YuxinWu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In CVPR, 2019

[46] Chen, Tianlong; Cheng, Yu; Gan, Zhe; Wang, Jianfeng; Wang, Lijuan; Wang, Zhangyang; Liu, Jingjing (2021): Adversarial Feature Augmentation and Normalization for Visual Recognition. Online verfügbar unter https://arxiv.org/pdf/2103.12171

[47] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.

Seitenhierarchie

Trends in Data Augmentation

Introduction

Definition

Categorization

From Basic Transformations to Deep Autoaugment

1. Geometric transformations and cropping

2. Intensity operations and filtering

Advantages

3. Autoaugment

4. Deep Autoaugment

From Occlusion and Combination to Transparency

1. Cutout

Advantages

Drawbacks

2. MixUp

Advantages:

Drawbacks:

3. CutMix

Advantages

4. Transparency algorithm

Advantages

From Object Insertion to SwapMix

1. Cut-paste-learn

Advantages

Drawbacks

2. Object insertion with visual context

3. Swapmix

Advantages:

Drawbacks:

From Adversarial Training to Fast Adversarial Propagation

1. Adversarial training

Advantages

Drawbacks

2. Adversarial propagation

Procedure

Advantages

Weaknesses

III. Fast adversarial propagation

Advantages

Self-Augmentation Mechanism (SAM)

Review

References