Abstract
The application of Diffusion Models in Medical Imaging has emerged as a promising approach to enhance diagnostic and prognostic capabilities. This blog post will present a concise introduction to the topic, then describe two different applications based on it. In the end, we will also give our conclusion related to the result and an overall review.
1. Introduction
Deep generative models have recently exhibited high-quality samples in various data modalities. Ever since their groundbreaking introduction in the seminal work, Generative Adversarial Networks (GANs) [2] have consistently led the way in various image generation tasks [5].
But this technique exhibits severe limitations such as a complex training process [10], limitation of the diversity in generated images [6], and so on.
In recent research, Ho et al. proposed a diffusion probabilistic model [3] that is used for image synthesis with a performance superior to generative adversarial networks (GANs) [1].
1.2 Diffusion Model
A diffusion model [9] introduced by Sohl-Dickstein et al. is a general class of generative models that learns to model the data distribution by iterative denoising an input noise signal.
The core idea of a diffusion model is to simulate a diffusion process where noise is progressively removed from the initial noise signal, resulting in a generated sample that approximates the target data distribution.
1.3 Denoising Diffusion Probabilistic Model Algorithm
The algorithm defined in the paper consists into define a forward(or inference) diffusion process which converts a data distribution input to a pure noised image and then learns a finite-time reversal of this diffusion process which defines the generative model distribution.
1.3.1 Forward process
The idea of the forward diffusion process is started with a sample from some target data distribution $\mathbf{x_0}$:
Fig 1. Markov chain of forward process and how it generates the noise [9]
Then gradually adds noise to the image over big t time steps. As the method uses a Markov chain, the corrupted sample conditioned on the initial data point $\mathbf{x_0}$ can be written as the product of successive single-step conditionals, because the distribution at a particular time step only depends on the sample from the immediately previous step. This process can be defined by:
\begin{equation} q(x_{1\ldots\textit{t}} | x_{0}) = \prod_{i=1}^{\textit{t}} q(x_{\textit{t}} | x_{\textit{t}-1}) \end{equation} |
Each step ${q(x_{{t}} | x_{{t}-1})}$ is parameterized as a Diagonal Gaussian, where $\beta_t$ is the variance at a particular time t. $\beta_t$ increases with time and is restricted to \beta_t \in (0,1) :
\begin{equation} q(x_{{t}} | x_{{t}-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}, \beta_t I) \end{equation} |
In the limit as \textit{t} approaches to the infiniti, \textit{q} will approach a gaussian centered at zero with identity covariance which losing all the information about the original sample.
\begin{equation} q(x_{{T}} | x_{{0}}) = \mathcal{N}(0,I) \end{equation} |
1.3.2 Reverse process
For the reverse process, the noise is gradually removed from the corrupted image through a series of steps.
Each step of this process is also defined as a unimodal diagonal gaussian. Aside from the $x_t$, the model also takes t as input in order to account for the forward variance schedule, different time steps are associated with the different noise levels and the model can learn to undo these individually:
\begin{equation} p_\theta(x_{{t-1}} | x_{t}) := \mathcal{N}(x_{t-1}; \mu_\theta, \Sigma_\theta(x_t,t)) \end{equation} |
As the forward process, the reverse process also is set up as a Markov chain:
\begin{equation} p_\theta(x_{0:T}) := p(x_T) \prod_{i=1}^{\textit{T}} p_\theta(x_{\textit{t-1}} | x_{\textit{t}}) \end{equation} |
where $p(x_{T})$ is the pure noise distribution $\mathcal{N}(0,I)$. Then at inference time in order to generate the sample, the process starts from a gaussian and begin sampling from the learned individual steps of the revers process $p_\theta(x_{t-1})$ until produccing the $x_0$.
1.3.3 Loss function
In a diffusion model, the loss function is used to measure the discrepancy between the predicted output of the model and the ground truth values during the training process. The objective is to minimize this discrepancy, which is essentially the error or the difference between the predicted values and the actual values in the dataset.
The first KL divergence encourages that the $q_x_{T}$ is similar to the latent variable $p_x_{T}$, since the variance is fixed in this model, the first term can be ignored.
The second term is the sum of KL divergence of each represent a reverse step and a forward process posterior conditioned on $x_0$. As this term is the KL divergence of 2 Gaussian distribution and applying some simplification and reparametization techniques, it can be reduced to the following formula:
To minimize it, the model is learning to predict the sample $\epsilon$ drawn from a normal distribution with mean square error (MSE).
The third term is for some clarification. images have values between 0 and 255, normalizing to (-1, 1) to have the same range as the prior standard normal distribution centered at 0.
Then the probability of $x_0$ given the $x_1$ as following:
where D is the data dimension which is the number of all pixels. The integral in this formula integrates over the range around the actual value of the pixel in $x_0$, and the result will be high if it predicts a mean value in the area of the true pixel, and if it happens for each pixel, the product will result in a high number and those the $p(x_{0} | x_{1})$ is high. Otherwise, it will result in a low probability.
2. Application of Diffusion Models for Medical Imaging
Diffusion models in medical imaging offer non-invasive and quantitative information about tissue microstructure, enabling better diagnosis, treatment planning, and understanding of various diseases and conditions. However, it's important to note that the application of diffusion models requires specialized imaging protocols, advanced image processing techniques, and expertise in medical image analysis.
This blog will talk about two papers that use diffusion models for medical image segmentation and reconstruction.
2.1 Unsupervised Denoising of Retinal OCT with Diffusion Probabilistic Model [11]
2.1.1 Motivation
Denoising retinal Optical Coherence Tomography (OCT) images is a crucial step in enhancing the quality of OCT data and improving the accuracy of subsequent image analysis tasks.
Fig 2. Optical coherence tomography of human retina [13]
Due to its limited spatial-frequency bandwidth, the image quality can be degraded by the speckle. Hence, despeckling becomes an important preprocessing step for clinical diagnoses and further image analysis. The traditional way to reduce it requires a large time that can be problematic for patient comfort, also registration artifacts caused by eye movement can be an issue. Therefore, a denoising algorithm that does not require repeated acquisitions is desirable.
Fig 3. Example of an OCT image containing high-contrast speckle [14]
2.1.2 Related work
A deep learning approach to denoise optical coherence tomography images of the optic nerve head [15]
The principal methodology adopted in this research involves the application of a deep learning technique, specifically a convolutional neural network (CNN), for the denoising of OCT images pertaining to the optic nerve head. To achieve this, the authors utilize a substantial dataset of OCT images to train the CNN, thereby enabling it to discern and internalize the underlying patterns and characteristics of the noisy images. By leveraging this acquired knowledge, the CNN endeavors to effectively suppress noise while simultaneously preserving the crucial anatomical structures present in the OCT images. The proposed approach is aimed at elevating image quality, mitigating artifacts arising from noise, and ultimately enhancing the visual representation of the optic nerve head to facilitate improved clinical interpretation and accurate diagnosis.
Fig 4. Speckle noise reduction using a custom deep learning network [15]
Speckle noise reduction in optical coherence tomography images based on edge-sensitive cgan [16]
The primary objective of this study is to tackle the problem of speckle noise in OCT images through the utilization of a conditional generative adversarial network (CGAN) endowed with edge sensitivity. The authors introduce an innovative CGAN-based model, which harnesses the generative and discriminative capabilities of CGANs, and integrates edge sensitivity to retain crucial image details while proficiently mitigating speckle noise.
Fig 5. Speckle noise reduction using conditional generative adversarial network [16]
2.1.4 Methodology
The solution of this paper consists of two parts.
Fig 8. General workflow
The first part is using the self-fusioin method as a pre-processing step in the training stage. It regards b-scans in a small vicinity of a given target b-scan as 'atles' for that b-scan. After registering the neighbors to the target b-scan, a pixel-wise weighted average of these ‘atlases’ will result in an image with a high signal-to-noise ratio (SNR). Since the diffusion probabilistic model aims to learn the speckle pattern instead of the signal, the self-fusion output $\mathbf{x_0}$ can still be used as the clean image for training purposes.
The second part is using the diffusion model described in the previous section to despeckle OCT images.
2.1.5 Dataset
The model is trained on 6 optic nerve head volumes of the human retina. Then, test it on 6 fovea volumes. Each volume contains 500 b-scans of 512 x 500 pixels. The experiment was tested on 3 different SRN radios (92dB, 96dB, 101dB). The ground truth for the model is generated by averaging 5 repeated acquisitions for each b-scan.
2.1.6 Result
Table 1. SNR: signal-to-noise ratio, PSNR: peak signal-to-noise ratio, CNR: contrast-to-noise ratio, ENL: equivalent number of looks.
For comparing the result, they used a baseline model Pseudo-modality fusion network presented by Hu et al. [17]. Numerically it is clear to observe that the proposed method has higher performance compared to the baseline model.
The following image shows the denoising result of their model for a range of t values.
Fig 10. Fovea denoising results for different input SNR levels and for different t values
Increasing t values indicate more denoising steps. The best result (determined visually) for each noise level, as highlighted by the red box, coincide with the intuition that the noisier images benefit from a larger t. For the third column, where the input noise level is relatively low, we can see that as t increases from 41 to 51, retinal layers gradually become over-smoothed, and fine texture features fade away. When the t is too large (e.g. t = 70 in Fig. 2), the added noise becomes excessive and produces poor results.
Fig 11. Result comparison with baseline model. 5-mean refers to the average image of 5 repeated b-scans
Comparing the result with the baseline model, we observe that the retinal layers are more homogeneous in our proposed approach than in PMFN for all input SNR levels. Downstream analysis tasks such as layer segmentation would likely benefit from this improvement.
2.1.7 Conclusion
The paper concludes that the proposed approach based on the diffusion probabilistic model is successful in enhancing the quality of retinal OCT images. By incorporating the principles of the diffusion process, the method effectively reduces speckle noise and enhances image clarity, leading to improved visualization of the optic nerve head and other retinal structures.
Pros
One of the major strengths of this paper is the development of an unsupervised denoising technique for retinal OCT images. By not requiring labeled training data, the approach becomes versatile and can be applied to a wide range of OCT datasets without the need for manual annotations. Also improved image quality can enhance the visualization of retinal structures, aiding ophthalmologists in accurate diagnoses and treatment planning.
Cons
It could be a plus to provide some validation results on a diverse set of OCT datasets from various sources to demonstrate the generalizability and robustness of the proposed technique. Also, they should address the time requirements associated with the proposed approach.
2.2 Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models [12]
2.2.1 Motivation
Brain anomalies, such as tumors, hemorrhages, and lesions, can be critical indicators of various neurological disorders and diseases. Detecting and segmenting these anomalies in medical images is essential for early diagnosis, treatment planning, and monitoring disease progression.
Deep generative models have also been tools for detecting arbitrary anomalies in medical imaging, dispensing with the necessity for manual labeling. But there is also a limitation for autoregressive models. For instance, the model only uses information from the preceding elements, requiring images to be modeled as 1d sequences, and since the relationships between different regions in brain images can be complex and diverse, it becomes a problem. Also the accumulation of prediction errors, because in the training process, the ground truth is provided for each step, while inference is performed on previously sampled elements, and it can affect the computation of likelihoods.
2.2.2 Related work
Autoencoders for unsupervised anomaly segmentation in brain MR images [18]
In this study, the authors investigate the use of autoencoders for unsupervised anomaly segmentation in brain MR images. They conduct a comprehensive comparative analysis of different autoencoder architectures to assess their performance in detecting and segmenting brain anomalies without the need for manual annotations.
Fig 12. Autoencoders for unsupervised anomaly segmentation in brain MR images [6]
Unsupervised brain anomaly detection and segmentation with transformers [19]
The study explores the capabilities of transformers in automatically identifying and segmenting brain anomalies without the need for manual annotations or labeled training data. The results of the research present promising evidence of the effectiveness of transformers in the context of medical imaging and anomaly detection.
Fig 13. Residual maps on the synthetic dataset from the variational autoencoder and different steps of the paper's approach.
2.2.3 Methodology
The proposed method is composed of two models: Vector quantized variational autoencoder (VQ-VAE) and DDPM. First, they trained both models on normal data.
Fig 14. The diffusion and reverse processes involved in paper’s anomaly segmentation method, combining a compression model (autoencoder) and a DDPM.
Compression model
In VQ-VAE, the model is used to learn a compact latent representation that offers significantly reduced computational complexity for the diffusion model.
The encoder maps the given image to a latent representation. Then, using the codebook to perform an element-wise quantization of each latent variable to create the quantized latent representation $\mathbf{z}$.
The decoder reconstructs the observation to image distribution.
Denoising Diffusion Probabilistic Models
The DDPM is used to learn the distribution of the latent representation of healthy brain imaging, in the same way described previously.
The proposed anomaly segmentation method
After training the VQ-VAE and DDPM on normal data, they used VQ-VAE to obtain the latent representation $\mathbf{z}$ of the test images. Then pass it to DDPM to generate the noisy representation of this latent distribution. In the reverse process, when calculating the $\mathbf{L_{t-1}}$, they notice that if the input image is from a healthy subject, it will only remove the Gaussian noise and resulting in a low KL divergence. But if the image contains an anomaly, it gets a high result in the anomalous regions. So using a threshold, they can create a binary mask indicating where it is.
The last step is using the decoder of VQ-VAE to get to the pixel space.
2.2.4 Dataset
Task | Dataset | Description |
---|---|---|
Anomaly Segmentation and Detection on Synthetic Anomalies | MedNIST | -Corrupted with sprites -Train with “HeadCT” 9,000 images -Test on 100 images contaminated with sprites |
Anomaly Segmentation on MRI Data | UK Biobank (UKB) [20] | Train on 15000 participants with the lowest lesion volume |
UK Biobank (UKB) [20] | Test for White matter hyperintensities | |
Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) [21] | Test for tumors | |
Multiple Sclerosis dataset from the University Hospital of Ljubljana (MSLUB) [22] | Test for demyelinating lesions | |
White Matter Hyperintensities Segmentation Challenge (WMH) [23] | Test for small vessel disease | |
Inference time of anomaly segmentation in Intracerebral hemorrhage (ICH)
| CROMIS [24] | Training set: 2D CT axial slices without ICH from 200 participants |
CROMIS [24] | Test set: 21 participants | |
KCH | ||
CHRONIC [25] | ||
WMH |
2.2.5 Result
Anomaly Segmentation and Detection on Synthetic Anomalies
The proposed model method has a significantly higher performance compared to the transformer while showing a slightly better performance than the ensemble on the DICE score but slightly smaller in AUPRC.
Anomaly Segmentation on MRI Data
The proposed approach in the anomaly segmentation task our method performs as well as an ensemble of transformers on the same dataset used to train (i.e., UKB). It performs better than the single transformer on all datasets; however, the ensemble generalizes better.
Inference time of anomaly segmentation in Intracerebral hemorrhage (ICH)
It is clear to observe that the proposed method is much faster than the transformer-based approaches. Also using the DDIM, a denoising diffusion implicit model, that has been proposed to speed up the reverse process, we can see that this model can reduce the time to less than 1min.
2.2.6 Conclusion
In addition to the image quality, fast inference time is a crucial factor for medical application, the proposed method successfully reduced the inference time of anomaly segmentation, which is an important factor that can prejudice the patient's comfort. Also, The model performed competitively compared with transformers on both synthetic and real data, where it showed a better performance in most cases when compared to a single transformer.
Pros
Brain anomaly detection and segmentation are highly relevant to clinical practice, facilitating early diagnosis and treatment planning, and potentially improving patient outcomes.
Cons
Anomalies in brain MR images can exhibit significant variability in size, shape, and intensity, which might pose challenges for unsupervised methods like diffusion models.
3. Review
DDPM can achieve higher quality and efficiency in image processing and analysis for medical applications such as segmentation or reconstruction compare to other traditional models. The diffusion model is also differentiated from VAE and flows models for learning with a fixed procedure and the latent variable has a high dimensionality that is the same as the original data. And by the end, DDPM has just caught the attention of the machine learning community in the medical field, the number of papers has recently increased a large amount, which means that there is still a research space for diffusion models.
References
[1] Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. CoRR, abs/2105.05233, 2021.
[2] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
[3] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. CoRR, abs/2006.11239, 2020.
[4] Dewei Hu, Yuankai K. Tao, and Ipek Oguz. Unsupervised denoising of retinal OCT with diffusion probabilistic model. CoRR, abs/2201.11760, 2022.
[5] Gihyun Kwon, Chihye Han, and Dae-shik Kim. Generation of 3d brain mri using auto-encoding generative adversarial networks. In Medical Image Computing and Computer Assisted Intervention – ICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III, page 118–126, Berlin, Heidelberg, 2019. Springer-Verlag.
[6] Xiang Li, Yuchen Jiang, J.J. Rodriguez-Andina, Hao Luo, Shen Yin, and Okyay Kaynak. When medical images meet generative adversarial network: recent development and research opportunities. 1, 09 2021.
[7] Walter H. L. Pinaya, Mark S. Graham, Robert Gray, Pedro F Da Costa, Petru-Daniel Tudosiu, Paul Wright, Yee H. Mah, Andrew D. MacKinnon, James T. Teo, Rolf Jager, David Werring, Geraint ees, Parashkev Nachev, Sebastien Ourselin, and M. Jorge Cardoso. Fast unsupervised brain anomaly detection and segmentation with diffusion models, 2022.
[8] Vedant Singh, Surgan Jandial, Ayush Chopra, Siddharth Ramesh, Balaji Krishnamurthy, and Vineeth N. Balasubramanian. On conditioning the input noise for controlled image generation with diffusion models, 2022.
[9] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. CoRR, abs/1503.03585, 2015.
[10] Hoang Thanh-Tung, Truyen Tran, and Svetha Venkatesh. On catastrophic forgetting and mode collapse in generative adversarial networks. CoRR, abs/1807.04015, 2018.
[11] Dewei Hu, Yuankai K. Tao, and Ipek Oguz. Unsupervised denoising of retinal oct with diffusion probabilistic model, 2022.
[12] Pinaya, W.H.L. et al. (2022). Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022.
[13] Huang, D., Swanson, E. A., Lin, C. P., Schuman, J. S., Stinson, W. G., Chang, W., Hee, M. R., Flotte, T., Gregory, K., Puliafito, C. A., et al., “Optical coherence tomography,” science 254(5035), 1178–1181 (1991).
[14] Schmitt, J. M., Xiang, S., and Yung, K. M., “Speckle in optical coherence tomography: an overview,” in [Saratov Fall Meeting’98: Light Scattering Technologies for Mechanics, Biomedicine, and Material Science ], 3726, 450–461, International Society for Optics and Photonics (1999).
[15] Devalla, S. K., Subramanian, G., Pham, T. H., Wang, X., Perera, S., Tun, T. A., Aung, T., Schmetterer, L., Thi´ery, A. H., and Girard, M. J., “A deep learning approach to denoise optical coherence tomography images of the optic nerve head,” Scientific reports 9(1), 1–13 (2019).
[16] Ma, Y., Chen, X., Zhu, W., Cheng, X., Xiang, D., and Shi, F., “Speckle noise reduction in optical coherence tomography images based on edge-sensitive cgan,” Biomedical optics express 9(11), 5129–5146 (2018).
[17] Hu, D., Malone, J. D., Atay, Y., Tao, Y. K., and Oguz, I., “Retinal oct denoising with pseudo-multimodal fusion network,” in [International Workshop on Ophthalmic Medical Image Analysis ], 125–135, Springer (2020).
[18] Baur, C., Denner, S., Wiestler, B., Navab, N., Albarqouni, S.: Autoencoders for unsupervised anomaly segmentation in brain mr images: a comparative study. Medical Image Analysis 69, 101952 (2021)
[19] Pinaya, W.H.L., Tudosiu, P.D., Gray, R., Rees, G., Nachev, P., Ourselin, S., Cardoso, M.J.: Unsupervised brain anomaly detection and segmentation with transformers. arXiv preprint arXiv:2102.11650 (2021)
[20] Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12(3), e1001779 (2015)
[21] Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629 (2018)
[22] Lesjak, Ž., Galimzianova, A., Koren, A., Lukin, M., Pernuš, F., Likar, B., Špiclin, Ž.: A novel public mr image dataset of multiple sclerosis patients with lesion segmentations based on multi-rater consensus. Neuroinformatics 16(1), 51–63 (2018)
[23] Kuijf, H.J., Biesbroek, J.M., De Bresser, J., Heinen, R., Andermatt, S., Bento, M., Berseth, M., Belyaev, M., Cardoso, M.J., Casamitjana, A., et al.: Standardized assessment of automatic segmentation of white matter hyperintensities and results of the wmh segmentation challenge. IEEE transactions on medical imaging 38(11), 2556–2568 (2019)
[24] Wilson, D., Ambler, G., Shakeshaft, C., Brown, M.M., Charidimou, A., Salman, R.A.S., Lip, G.Y., Cohen, H., Banerjee, G., Houlden, H., et al.: Cerebral microbleeds and intracranial haemorrhage risk in patients anticoagulated for atrial fibrillation after acute ischaemic stroke or transient ischaemic attack (cromis-2): a multicentre observational cohort study. The Lancet Neurology 17(6), 539–547 (2018)
[25] Mah, Y.H., Nachev, P., MacKinnon, A.D.: Quantifying the impact of chronic ischemic injury on clinical outcomes in acute stroke with machine learning. Frontiers in neurology 11, 15 (2020)