Blog post written by: Halil Ibrahim Canakkaleli

Based on: Wyatt, J., Leach, A., Schmon, S. M., & Willcocks, C. G. (2022). ANODDPM: Anomaly detection with denoising diffusion probabilistic models using Simplex Noise. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/cvprw56347.2022.0008

Introduction
- Generative Models for Anomaly Detection
- Denoising Diffusion Probabilistic Models
  - Forward Diffusion Process
  - Reverse Process
Method
Experiments and Results
Discussion
ChatGPT Prompts
List of Abbreviations
References

Introduction

Tumors are anomalies in our bodies that can be life-threatening if not detected early. Fortunately, these anomalies can be identified. Although radiologists can detect tumors from medical images, this process is time-consuming and prone to errors due to the high volume of images they must review and the subtlety of early-stage anomalies. Fatigue, high workload, and the inherent variability in human interpretation further exacerbate the issue, often leading to missed or incorrect diagnoses. [1] Recent advancements in generative models provide powerful tools to automate anomaly detection and improve detection accuracy.

In this blog post, I will investigate a novel anomaly detection approach, Anomaly Detection with Denoising Diffusion Probabilistic Models (AnoDDPMs) using simplex noise, developed by Julian Wyatt et al. By the end of this post, I hope you will understand the advantages and disadvantages of this approach compared to previous methods.

Generative Models for Anomaly Detection

Generative models are a class of statistical models designed to generate new data samples by learning and approximating the distribution of the training dataset. In medical imaging, these models are particularly advantageous for unsupervised anomaly detection due to their ability to model complex data distributions without requiring labeled data, which is often labor-intensive and time-consuming.

Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) are two leading generative models known for capturing the complex features and statistical properties of training data. A recent literature review [2] found that 69.23% of research papers on anomaly detection employ some form or adaptation of VAEs or GANs. These models generate non-anomalous images from anomalous data and identify abnormalities by comparing the reconstructed (VAEs) or generated (GANs) images with the original images.

Another popular choice for anomaly detection is Denoising Diffusion Probabilistic Models (DDPMs). DDPMs [3] are one of the generative models used in machine learning that produce high-quality samples through a process that gradually denoises data. DDPMs outperform both GANs and VAEs in mode coverage while still offering better sample quality than VAEs and more stable training than GANs. [4] However, the long Markov chains used in DDPMs can cause poor scalability and longer processing times. To address this issue, researchers have developed this novel approach using a partial diffusion strategy, demonstrating that full Markov chains are not necessary for reconstruction-based anomaly detection.

Denoising Diffusion Probabilistic Models

The main idea is that rather than creating something from nothing, DDPMs learn to reverse a process that adds noise to data until it is corrupted into a normal distribution. These models typically reconstruct non-anomalous images from anomalous data and detect abnormalities by comparing the reconstructed and original images. This two-step approach consists of a forward process and a reverse process. The following figure can help you visualize this transformation into an image with noise and its subsequent reconstruction.

Forward Diffusion Process

Let's talk about the math. I do not want to confuse you with lots of high level mathematics. Therefore, I will explain the equations needed to understand the AnoDDPM. So, the authors used the improvements done to the previous methods by [3]. The forward process starts with a clean data sample and incrementally adds Gaussian noise over many steps until the data is entirely random noise. The forward process gradually transforms initial data $\begin{array}{l}x_0\end{array}$ $\begin{array}{l}q(x_0)\end{array}$

$\begin{array}{l}\displaystyle q(x_t | x_{t-1}) = \mathcal{N}(x_t | x_{t-1} \sqrt{1 - \beta_t}, \beta_t \mathbf{I})\end{array}$

$\begin{array}{l}t\end{array}$

$\begin{array}{l}\alpha_t = 1 - \beta_t\end{array}$

$\begin{array}{l}\displaystyle q(x_t | x_0) = \mathcal{N}(x_t | x_0 \sqrt{\bar{\alpha}_t}, (1 - \bar{\alpha}_t) \mathbf{I})\end{array}$

Then, we can directly calculate $\begin{array}{l}x_t\end{array}$ without computing intermediate steps $\begin{array}{l}x_{t-1}\end{array}$ ,...,1.

$\begin{array}{l}\displaystyle x_t = x_0 \sqrt{\bar{\alpha}_t} + \epsilon_t \sqrt{1 - \bar{\alpha}_t}, \quad \epsilon_t \sim \mathcal{N}(0, \mathbf{I})\end{array}$

The reverse process involves training a neural network to predict the Gaussian noise added at each step of the forward process and subtract it. The model reconstructs the original data $\begin{array}{l}\hat{x}_0\end{array}$ from corrupted data $\begin{array}{l}x_t\end{array}$ by sequentially predicting and removing noise. This is also modeled as a normal distribution $\begin{array}{l}\mathcal{N}\end{array}$ with mean $\begin{array}{l}\mu_\theta(x_t, t)\end{array}$ and covariance $\begin{array}{l}\bm{\Sigma}_\theta(x_t, t)\end{array}$ but in reverse for $\begin{array}{l}t\end{array}$ = T,...,1.

$\begin{array}{l}\displaystyle p_\theta(x_{t-1} | x_t) := \mathcal{N}(x_{t-1} | \bm{\mu}_\theta(x_t, t), \bm{\Sigma}_\theta(x_t, t))\end{array}$

First, for covariance, the authors use a modification to fix the $\begin{array}{l}\bm{\Sigma}_\theta(x_t, t)\end{array}$ to $\begin{array}{l}\tilde{\beta_t} \mathbf{I}\end{array}$ with $\begin{array}{l}\tilde{\beta_t} = \frac{1 - \bar{\alpha}_{t-1}}{1 - \bar{\alpha}_t} \beta_t\end{array}$ as the original DDPM paper [3] suggested,

$\begin{array}{l}\displaystyle p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1} | \bm{\mu}_\theta(x_t, t), \tilde{\beta_t} \mathbf{I})\end{array}$

Next, to predict noise, we should find the mean $\begin{array}{l}\mu_\theta(x_t, t)\end{array}$ . However, we don't have access to $\begin{array}{l}x_0\end{array}$ , only have access to $\begin{array}{l}x_t\end{array}$ , as we explained in the forward process. So, they train a neural network model $\begin{array}{l}\epsilon_\theta(x_t,t)\end{array}$ to predict $\begin{array}{l}\epsilon\end{array}$ and set $\begin{array}{l}\mu_\theta(x_t, t)\end{array}$ to following,

$\begin{array}{l}\displaystyle \mu_\theta(x_t, t) = \frac{1}{\alpha_t} \left( x_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(x_t, t) \right)\end{array}$

Lastly, the authors used a simplified objective function using trained model $\begin{array}{l}\epsilon_\theta(x_t,t)\end{array}$ from [3].

$\begin{array}{l}\displaystyle \mathcal{L}_s_i_m_p_l_e = \mathbb{E}_{t \sim [1-T], x_0 \sim q(x_0), \epsilon \sim \mathcal{N}(0, \mathbf{I})} \left[ \| \epsilon - \epsilon_\theta (x_t, t) \|^2 \right].\end{array}$

Method

Julian Wyatt et al. propose a novel adaptation of DDPMs to detect anomalies in high-quality images. The authors claim full Markov chains are not required for reconstruction-based anomaly detection. Therefore, they employ a partial diffusion strategy and a multi-scale simplex noise forward process to capture larger anomalies and improve detection performance.

Why Do We Need Simplex Noise?

Let's start by understanding what Gaussian noise is. Gaussian noise follows a normal distribution characterized by its mean and standard deviation. Gaussian noise is popular in the current landscape of mathematics due to its well-understood mathematical behavior and occurrence in many real-world situations. Therefore, it is used in many algorithms and applications, such as DDPMs. However, in this paper, we can see a clever observation about Gaussian noise made by the authors. This observation revolves around the power law distribution of frequencies of natural images, which means lower frequencies are more predominant in natural images. [5] Traditional Gaussian noise, which has uniform spectral density, does not corrupt low-frequency components as much as high-frequency ones. This aspect limits the effectiveness of AnoDDPM models in detecting anomalies since low-frequency areas appear less affected, and the images stay somewhat recognizable.

To address this, the authors suggest modifying the forward process to apply a noise that follows a similar but different power law, which impacts low-frequency components more. Therefore, they choose simplex noise, which allows for precise control over the noise's frequency distribution. Simplex noise generates smooth and structured randomness, leading to more noticeable visual differences in the original image.

Feature	Gaussian Noise	Simplex Noise
Mathematical Properties	Well-understood with straightforward mathematical properties.	More complex mathematical structure, often used in computer graphics for its smoothness.
Generation Complexity	Simple to generate using standard libraries and algorithms.	Slightly more complex to generate due to its non-linear structure and multiple octaves.
Impact on Image Quality	Causes grainy noise, which can be easily noticeable in high-frequency areas.	Produces smoother noise patterns, less grainy and more visually coherent.
Noise Structure	Random and uncorrelated, resulting in a uniform noise distribution.	Structured randomness, often appearing more natural and less artificial.

Table 1: Comparison of Gaussian Noise and simplex Noise

AnoDDPM

The method begins by corrupting an initial image $\begin{array}{l}x_0\end{array}$ into $\begin{array}{l}x_t\end{array}$ over $\begin{array}{l}t\end{array}$ timesteps, and then denoising it back to $\begin{array}{l}\hat{x}_0\end{array}$ . The process involves parameterizing $\begin{array}{l}x_t\end{array}$ into $\begin{array}{l}x_λ\end{array}$ , allowing for the adjustment of $\begin{array}{l}λ\end{array}$ to remove anomalies of varying sizes. By this reparametrization, we can have more control over diffusion and by increasing the $\begin{array}{l}λ\end{array}$ we can remove larger anomalies.

The use of simplex noise is key because its frequency can be adjusted depending on the size of the anomalous area. To enhance this effect, the method incorporates multiple octaves of noise $\begin{array}{l}N\end{array}$ , where each subsequent frequency's amplitude decreases at a decay rate $\begin{array}{l}γ\end{array}$ . Furthermore, the selection of parameters for simplex noise in Figure 3 helps the noise distribution to more closely resemble a Gaussian distribution, which is crucial since the DDPM model assumes the noising function samples from a Gaussian distribution.

A conditional DDPM is used for anomaly detection at inference time. Initially, a query from an anomalous dataset is noised to a specific timestep $\begin{array}{l}x_0\end{array}$ $\begin{array}{l}x_λ\end{array}$ and then denoised back to $\begin{array}{l}x_λ\end{array}$ $\begin{array}{l}x_0\end{array}$ . After reconstructing the image, the approach involves calculating the square error between the original and the reconstructed image, $\begin{array}{l}(x_0-\hat{x}_0)^2\end{array}$ . A naive threshold of 0.5 is applied to segment the tumors. Finally, the method's effectiveness is then evaluated by comparing these predictions to the ground truth.

The author's model is trained on a single NVIDIA Titan Xp GPU with 12GB GDDR5. They made their model available at [6].

Datasets

For training, the authors used a healthy dataset from the Neurofeedback Skull-Stripped (NFBS) repository [7], which contains 125 T1-weighted MRI scans. On the other hand, for evaluation, they used an anomalous dataset of brain tumors from the Centre for Clinical Brain Sciences at the University of Edinburgh [8], which contains 22 T1-weighted MRI scans. Furthermore, AnoDDPM samples were evaluated on the leather subset of the MVTec AD dataset [9] with simplex and Gaussian noise to investigate the effectiveness in different domains. The results of the experiments and evaluation will be presented in the next part of the blog post.

Experiments and Results

For evaluation, the authors segmented anomalies across 22 images, and the reconstruction was performed using two different noise functions, Gaussian noise and simplex noise, to compare the impact of the noise function on the effectiveness of anomaly detection.

Gaussian Noise vs Simplex Noise

The choice of noise function plays a critical role in anomaly detection performance, as seen in Figure 5. The figure compares the performance of AnoDDPM using different noise parameters and diffusion processes for anomaly detection in brain MRI scans. It illustrates the original MRI scans $\begin{array}{l}x_0\end{array}$ , scans with applied noise $\begin{array}{l}x_λ\end{array}$ , denoised scans $\begin{array}{l}\hat{x}_0\end{array}$ , square error $\begin{array}{l}E_s_q\end{array}$ , segmentation map $\begin{array}{l}E_s_e_g\end{array}$ , and ground truth anomalies $\begin{array}{l}GT\end{array}$ .

The samples in the middle are edited by the authors for $\begin{array}{l}λ\end{array}$ = 250. The bottom samples represent the images with Gaussian noise for $\begin{array}{l}λ\end{array}$ = 250, 500, 750. These samples show that for lower $\begin{array}{l}t\end{array}$ values, high-quality images produced. However, for higher $\begin{array}{l}t\end{array}$ values, Gaussian noise generates completely new images. The top samples represent the reconstructed images generated using simplex noise with $\begin{array}{l}v\end{array}$ values between $\begin{array}{l}2^-^1\end{array}$ and $\begin{array}{l}2^-^6\end{array}$ . This indicates that simplex noise with higher frequencies significantly enhances anomaly detection, evidenced by clearer and more accurate predictions in the segmentation maps compared to those produced using Gaussian noise.

Figure 6 demonstrates the forward-backward diffusion process, comparing simplex and Gaussian noise in anomaly detection for brain MRI scans. Rows 1-2 show results for simplex noise trained on healthy data, and rows 3-4 illustrate Gaussian noise trained on healthy data. Row 5 shows that simplex noise can effectively repair anomalies; in contrast, row 6 shows that Gaussian noise cannot repair anomalies from anomalous test data.

Comparison with Other Methods

Then, the authors compared AnoDDPM with Gaussian noise, AnoDDPM with simplex noise, the context encoder reconstruction approach, and f-AnoGAN using the ROC curve analysis using square error. The results can be seen in Figure 7, and the comparison revealed that AnoDDPM with simplex noise significantly outperformed the other methods. While AnoDDPM with Gaussian noise showed marginal improvement over random performance, simplex noise consistently provided superior results. Furthermore, in segmentation metrics (DICE and IOU) and classification metrics (AUC), AnoDDPM outperformed its Gaussian and GAN counterparts. This highlights simplex noise's effectiveness in improving anomaly detection performance.

Leather Test

To investigate the effectiveness of AnoDDPM in different domains, the authors evaluated AnoDDPM on the leather subset of the MVTec AD dataset using both simplex and Gaussian noise. The results were impressive, with both methods producing excellent reconstructions. This demonstrated the versatility of AnoDDPM and its potential applicability to different domains beyond the initial dataset. The successful application to the MVTec AD dataset highlights the robustness of the method and its capability to generalize across various types of anomalies and image textures, further solidifying its value in anomaly detection tasks. How powerful AnoDDPM can be in different domains can be seen below:

Discussion

AnoDDPM promises advantages over GAN-based traditional approaches by providing better mode coverage and avoiding common limitations like instability and the need for extensive datasets. Furthermore, the use of simplex noise instead of Gaussian noise resulted in a notable improvement in detecting larger anomaly shapes. Additionally, AnoDDPM with simplex noise outperformed its Gaussian counterpart and f-AnoGAN in both segmentation and classification metrics.

The introduction of simplex noise, compared to Gaussian noise, allows the model to more effectively capture larger anomalies, despite AnoDDPM with Gaussian noise producing higher-quality samples. Furthermore, the results of the leather subset experiment are interesting and underscore the method's versatility, demonstrating its potential to overcome DDPM limitations across diverse fields.

However, the low number of MRI scans used for testing is a significant limitation. While small datasets are common in medical imaging due to data scarcity, a larger, more diverse dataset is necessary to confirm the method's effectiveness. The paper also lacks depth in explaining the experimental results.

Although the paper indicates that the partial diffusion strategy offers more control over the model, I would have liked to see more experiments to determine if this approach also addresses the inherent problem of DDPMs, which is long processing times.

Overall, this method shows promise for improving previous approaches but requires further investigation and validation before clinical application.

ChatGPT Prompts

Teach me about DDPMs.

Explain the math behind DDPMs.

Teach me about simplex noise and Gaussian noise.

Compare and analyze simplex noise vs Gaussian noise.

Write this equation in LaTeX.

Convert this image to LaTeX.

Rewrite/Summarize/Improve/Proofread this text.

You are an experienced ML researcher. Critically review my blog post.

List of Abbreviations

AnoDDPM Anomaly Detection with Denoising Diffusion Probabilistic Model

VAE Variational Autoencoder

GAN Generative Adversarial Network

DDPM Denoising Diffusion Probabilistic Model

References

[1] Onder, O., Yarasir, Y., Azizova, A. et al. Errors, discrepancies and underlying bias in radiology with case examples: a pictorial review. Insights Imaging 12, 51, 2021.

[2] Shi, Y., Anomaly Detection in Medical Imaging - A Mini Review, arXiv:2108.11986, 2021.

[3] Ho, J., Jain, A., & Abbeel, P., Denoising Diffusion Probabilistic Models. arXiv, 2020.

[4] Wyatt, J., Leach, A., Schmon, S. M., & Willcocks, C. G. (2022). ANODDPM: Anomaly detection with denoising diffusion probabilistic models using Simplex Noise. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

[5] Daniel L. Ruderman. Origins of scaling in natural images.Vision Research, 37(23):3385–3398, 1997.

[6] AnoDDPM Model Repository: https://github.com/Julian-Wyatt/AnoDDPM

[7] Benjamin Puccio, James P Pooley, John S Pellman, Elise C Taverna, and R Cameron Craddock. The preprocessed connectomes project repository of manually corrected skullstripped T1-weighted anatomical MRI data. GigaScience,5(1), 10 2016.

[8] Cyril Pernet, Krzysztof Gorgolewski, and Whittle Ian. A neuroimaging dataset of brain tumor patients. 2016.

[9] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019.

Seitenhierarchie

13: AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise