Uncertainty estimation for medical segmentation

In this blog post, the topic "Uncertainty estimation for medical segmentation" will be discussed, including its definition, motivation and development. Three latest methods of uncertainty estimation, namely Layer Ensembles (LE), Efficient Bayesian and ContRastive Image Segmentation for uncertainty Prediction (CRISP), will be introduced and compared in details.

Author: Junting Yao

Tutor: Unbekannter Benutzer (ga87jay)

1. Introduction

Since it requires much time and efforts to manually segment regions of interest from a medical image, deep learning has attracted much attention from researchers for the task of medical segmentation. Medical segmentation with high quality demands both high accuracy and high reliability. Accuracy can be improved through error detection, while reliability can be obtained by uncertainty estimation. Both of these characteristics are essential and indispensable. Highly accurate segmentation without uncertainty estimation may lead to over-parametrization, which will have negative results in new cases. This blog post will discuss about the topic of uncertainty estimation and discuss how the uncertainty estimation improves the quality of medical segmentation.

1.1 Definition of Uncertainty

Uncertainty can be divided into two parts: epistemic uncertainty and aleatoric uncertainty. Epistemic uncertainty, also known as knowledge uncertainty, is caused by limited data and knowledge and can be reduced through appropriate training data and training model. In contrast, aleatoric uncertainty, which can also be called data uncertainty, arises from the natural stochasticity of observations and cannot be mitigated even when more data is provided. As illustrated in a linear regression example in Figure 1, high epistemic uncertainty occurs in the regions where the original data is missing, while the aleatoric uncertainty becomes high due to the intrinsic scatter of the traning data.

Figure 1. An exhibit of the different kinds of uncertainty in a linear regression context [4]

A segmentation example is shown Figure 2. In set (d), it can be seen that high aleatoric uncertainty always occur on object boundaries and on objects far from the camera. Epistemic uncertainty in set (e) accounts for the ignorance about which model generated the collected data. The last row shows a failure case of the segmentation model when the model fails to segment the footpath due to increased epistemic uncertainty, but its aleatoric uncertainty doesn’t change.

Figure. 2 Aleatoric and epistemic uncertainty for semantic segmentation [5]

1.2 Development of Uncertainty Estimation Methods

Figure 3 illustrates the structure of different uncertainty estimation methods. When the uncertainty estimation firstly applied into the deep learning model, the uncertainty is usually interpreted directly by e.g. Softmax output as a categorical probability distribution. However, this direct interpretation may result in model miscalibration, which can be discussed more in the sub-topic confidence calibration. Furthermore, the direct interpretation cannot interpret aleatoric uncertainty correctly and cannot interpret epistemic uncertainty at all.

Since the direct interpretation is not able to satisfy the basic requirements on uncertainty estimation, researchers began to seek for specific methods for uncertainty estimation. The development of uncertainty estimation can be divided into two main branches: Non-Bayesian and Bayesian methods. Non-Bayesian methods, e.g. temperature scaling, Learning Confidence Estimates (LCE), mitigates the miscalibration which arises in direct interpretation. But the modification of architecture and disability to estimate epistemic uncertainty in non-Bayesian methods still need to be improved or solved.

Other researches focus on the development of uncertainty estimation methods in the direction of Bayesian model, which represents network weights as probability distribution and produces an output distribution. Compared to non-Bayesian methods, Bayesian methods can estimate both epistemic and aleatoric uncertainty. Epistemic uncertainty is obtained by capturing the changes of probability distribution in network weights given datasets, while aleatoric uncertainty is modeled by the output distribution [5]. But it can be sometimes unstable and inefficient due to its large number of learning parameters [9]. Therefore, some improvements have been further applied to the Bayesian methods. Deep Ensembles (DE) has been proposed in 2016 to increase the robustness of the model, which assumes a collection of M networks with different initialization trained with the same data and averages the predictive distribution $\begin{array}{l}p(y|x)\end{array}$ extracted from each network in the end [8]. In the aspect of improvements on efficiency, Monte Carlo Dropout (MC-Dropout) has been proposed by Gal et al. by activating dropout layers during test time. But MC-Dropout is usually used to estimate epistemic uncertainty and the aleatoric uncertainty should be additionally calculated with specific methods [5][7]. As the Bayesian model gives an predictive output distribution in the end, the uncertainty in the Bayesian model is commonly approximated by calculating the pixel-wise e.g. entropy, variance, mutual informaton as pixel-level uncertainty or aggregating them as image-level uncertainty.

Figure 3. Structure of different uncertainty estimation methods

2. Methodology

In this blog post, three novel methods in uncertainty estimation are chosen to be discussed, namely Layer Ensembles (LE), Efficient Bayesian and ContRastive Image Segmentation for uncertainty Prediction (CRISP).

2.1 Layer Ensembles [1]

As shown in Figure 4, Layer Ensembles consists of three types of blocks, namely encoder block, decoder block and segmentation head. Theoretically, the LE method can be applied to all types of architecture. Here, the LE method is illustrated in a U-Net architecture.

Figure 4. Layer Ensembles built on top of U-Net like architecture [1]

The idea of Layer Ensembles is inspired by Prediction Depth (PD) and Deep Ensembles (DE).

Prediction Depth (PD): PD is initially used to measure the example difficulty by training k-NN classifiers through feature maps after each layer. Given a network with $\begin{array}{l}N\end{array}$ layers, if the k-NN prediction for the $\begin{array}{l}L^{th}\end{array}$ layer is different to $\begin{array}{l}{L-1}^{th}\end{array}$ layer and same for posterior layer prediction, then

$\begin{array}{l}PD = L ~ \in [0,N]\end{array}$

In the LE method, PD is extened to a segmentation task through attaching the segmentation head after each layer output.

Deep Ensembles (DE): as explained in the previous chapter, DE assumes a collection of M networks with different initialization trained with the same data, obtains the predictive output distribution from the the softmax layer of each network and averages them. Inspired by DE, LE can be regarded as a compound of M sub-networks with different depths.

Combining the two ideas, each sub-network gives the predictive distribution through the softmax layer in the segmentation head and in the end the distribution extracted from the M sub-networks is averaged as $\begin{array}{l}p(y|x) = M^{-1} \sum_{m = 1}^{M} p_{\theta_m}(y|x,\theta_m)\end{array}$ . Then the entropy, variance and mutual information can be calculated based on the $\begin{array}{l}p(y_{i,j}|x)\end{array}$ as the pixel-wise uncertainty information. All of these information can be plotted as the uncertainty heatmaps. LE calculates the uncertainty individually, i.e. the uncertainty of one sample is independent on other samples.

2.2 Efficient Bayesian [2]

As stated in the previous chapter, Bayesian methods represent both network weight and output as probability distribution. Given a training dataset $\begin{array}{l}\mathcal D = \lbrace(x_i,y_i)\rbrace_{i=1}^{N}\end{array}$ , the network gives its weight the posterior distribution $\begin{array}{l}p(w| \mathcal D)\end{array}$ . At inference time, the segmentaion $\begin{array}{l}y^*\end{array}$ on a test image $\begin{array}{l}x^*\end{array}$ will be predicted as

$\begin{array}{l}p(y^* | x^*, \mathcal D) = \int p(y^*| x^*, w)p(w| \mathcal D) dw\end{array}$

where the first term $\begin{array}{l}p(y^*| x^*, w)\end{array}$ can be obtained through model forward pass at weight $\begin{array}{l}w\end{array}$ and the trajectory-based posterior sampling is used to obtain the second term $\begin{array}{l}p(w| \mathcal D)\end{array}$ in Efficient Bayesian. It requires much computation if the probabilistic prediction $\begin{array}{l}p(y^* | x^*, \mathcal D)\end{array}$ is directly calculated according to the equation, because of the integral and weight posterior inside. To improve its efficiency, checkpoints and approximation method are used.

Single-modal posterior sampling

In the standard nnU-Net, where network weights are represented by individual values, learning rate approaches to zero and weight converges at one point in the end of the learning. To obtain a probabilistic distribution of weights, certain amount of values of weight are collected as posterior samples near the local optimum. These sample points are defines as checkpoints in the Efficient Bayesian method. Here, the uncertainty is estimated for each image sample. The single-modal posterior sampling methodology of Efficient Bayesian is described in details in the following:

Set learning rate to a constant value after $\begin{array}{l}\gamma T\end{array}$ and sample weight checkpoints:

$\begin{array}{l}\displaystyle W = \lbrace w_t|\gamma T < t \leq T \rbrace\end{array}$

Approximation instead of integral

Monte-Carlo approximation:

$\begin{array}{l}\displaystyle p(y^*|x^*,\mathcal D) \approx {1\over n} \sum_{i=1}^n p(y^*|x^*,w_{t_i})\end{array}$

Stochastic weight averaging (SWA):

$\begin{array}{l}\displaystyle p(y^* |x^*, \mathcal D) = p(y^* |x^*, \overline w)\end{array}$

where $\begin{array}{l}\overline w = {1\over n}\sum_{i=1}^n w_{t_i}\end{array}$

Calculate the entropy of each pixel based on the predictive output distribution obtained before and use entropy to plot heatmaps:

$\begin{array}{l}\displaystyle H(y_{i,j}^*) = -\sum_{k=1}^C p(y_{i,j}^* = k|x^*, \mathcal D)\log_2p(y_{i,j}^* = k|x^*, \mathcal D)\end{array}$

Multi-modal posterior sampling

Both standard nnU-Net and single-modal posterior sampling obtain weight value(s) only at one local optimum (single mode). If the network isn't well initialized in these cases, the results may be far from the groundtruth and the network needs to be re-trained several times. Therefore, multi-modal posterior sampling has been proposed to obtain a wider range of probability. The network in multi-modal sampling will firstly perform the same as in single-modal posterior sampling. After a certain time, the learning rate will bounce to another value and the network will sample weights near another local optimum. Figure 5 shows the weight space in single- and multi-modal sampling.

Figure 5. t-SNE plot of the weight space during SGD training. (b) t-SNE plot of the posterior weights, which bounce in the neighborhood of different modes. [2]

The multi-modal posterior sampling methodology of Efficient Bayesian is described in details in the following:

Divide traning epoch $\begin{array}{l}T\end{array}$ to $\begin{array}{l}M\end{array}$ cycles, each of which consumes $\begin{array}{l}T_c = {T\over M}\end{array}$ epoch

Set cyclical learning rate. After $\begin{array}{l}\gamma\end{array}$ fraction of $\begin{array}{l}T_c\end{array}$ epoch:

$\begin{array}{l}\alpha (t) = \begin{cases} \alpha _r & t_c=0\\ \alpha_0[1 - {\min(t_c,\gamma T_c)\over T}]^\epsilon,&t_c>0\end{cases}\end{array}$

In $\begin{array}{l}c^{th}\end{array}$ training cycle, weight at this mode will be collected as:

$\begin{array}{l}\displaystyle W_c = \lbrace w_t|\gamma T_c \leq t \,\bmod\, T_c \leq T_c-1 \rbrace\end{array}$

In the end, the global weight posterior sampling is represented as the aggregation of local weight:

$\begin{array}{l}\displaystyle W = \bigcup_{c=1}^M W_c\end{array}$
(following steps are the same as in single-modal posterior sampling)

2.4 ContRastive Image Segmentation for uncertainty Prediction (CRISP) [3]

The methodology of CRISP can be divided into two main procedure: training and uncertainty prediction. The method is schematically represented in Figure 6.

Figure 6. Schematic representation of CRISP [3]

Training

Image and groundtruth maps are combined into batches of $\begin{array}{l}B\end{array}$ elements with $\begin{array}{l}C\end{array}$ channels and $\begin{array}{l}K\end{array}$ segmentation classes

$\begin{array}{l}X = [x_1x_2...x_B]\in \mathfrak R^{B\times C\times H\times W} \\ Y = [y_1y_2...y_B]\in {\lbrace0,1\rbrace}^{B\times C\times H\times W}\end{array}$

Use image encoder $\begin{array}{l}P_{\theta}\end{array}$ and segmentation encoder $\begin{array}{l}P_\phi\end{array}$ to code image $\begin{array}{l}x_i\end{array}$ and its associated segmentation $\begin{array}{l}y_i\end{array}$ into latent vectors

$\begin{array}{l}Z_X: \overrightarrow{z_{x_i}} \in \mathfrak R^{D_x} \\ Z_Y: \overrightarrow{z_{y_i}} \in \mathfrak R^{D_y}\end{array}$

where $\begin{array}{l}Z_X\end{array}$ is the set of latent vectors from image and $\begin{array}{l}Z_Y\end{array}$ is the set from segmentation groundtruth

Weight matrices $\begin{array}{l}W_x \in \mathfrak R^{D_h \times D_x}\end{array}$ (for image) and $\begin{array}{l}W_y \in \mathfrak R^{D_h \times D_y}\end{array}$ (for segmentation groundtruth) project latent vectors into a joint latent space

$\begin{array}{l}H_X: \overrightarrow{h_{x_i}} = {{W_x \cdot \overrightarrow{z_{x_i}}}\over {|| W_x \cdot \overrightarrow{z_{x_i}}||}} \\ H_Y: \overrightarrow{h_{y_i}} = {{W_y \cdot \overrightarrow{z_{y_i}}}\over {|| W_y \cdot \overrightarrow{z_{y_i}}||}}\end{array}$

Apply cosine similarity to the latent image vectors with their corresponding latent groundtruth vector

$\begin{array}{l}\displaystyle S=(H_X\cdot H_Y^T)e^\tau \in \mathfrak R^{B\times B}\end{array}$

Use cross-entropy loss as a contrastive loss to push $\begin{array}{l}S\end{array}$ towards an identity matrix, because the latent image vectors and the latent vectors from its corresponding segmentation groundtruth compute the diagnoal elements of $\begin{array}{l}S\end{array}$ and the cosine similarity should be close to 1

$\begin{array}{l}\displaystyle \mathcal L_{cont} = -{1\over 2}({1\over B}\sum_{i=1}^{B}\sum_{j=1}^{B}I_{ij}\log S_{ij} + {1\over B}\sum_{i=1}^{B}\sum_{j=1}^{B}I_{ji}\log S_{ji})\end{array}$

Reconstruct segmentation latent vectors with segmentation decoder $\begin{array}{l}Q_\psi\end{array}$ , trained with reconstruction loss $\begin{array}{l}\mathcal L_{rec}\end{array}$
In the end, minimize the loss

$\begin{array}{l}\displaystyle \mathcal L = \mathcal L_{cont} + \mathcal L_{rec}\end{array}$

Uncertainty prediction

After training, the groundtruth maps has been projected into latent vectors $\begin{array}{l}Z\end{array}$ and latent space $\begin{array}{l}H\end{array}$ . In the uncertainty prediction session, $\begin{array}{l}N\end{array}$ samples are chosen as latent anatomical prior distribution

$\begin{array}{l}\overline {Z_y} \in \mathfrak R^{N\times D_y} \\ \overline {H_y} \in \mathfrak R^{N\times D_h}\end{array}$

Let $\begin{array}{l}x^*\end{array}$ be a non-trained image and $\begin{array}{l}y^*\end{array}$ its associated segmentation map:

Use image encoder $\begin{array}{l}P_{\theta}\end{array}$ to code image $\begin{array}{l}x^*\end{array}$ and use weight matrix to project latent vectors into latent space $\begin{array}{l}\overrightarrow {h_{x^*}}\end{array}$
Compute $\begin{array}{l}\overrightarrow {h_{x^*}}\end{array}$ with each row of $\begin{array}{l}\overline H\end{array}$ and $\begin{array}{l}\overline S \in \mathfrak R^N\end{array}$ can be obtained
Select $\begin{array}{l}M\end{array}$ samples in $\begin{array}{l}\overline Z\end{array}$ with the highest value in $\begin{array}{l}\overline S\end{array}$
Decode the samples to obtain $\begin{array}{l}\overline Y^*\end{array}$
Compare these samples with the initial prediction $\begin{array}{l}y^*\end{array}$
Compute the average of the pixel-wise difference between $\begin{array}{l}y^*\end{array}$ and $\begin{array}{l}\overline y^*\end{array}$ to obtain uncertainty map $\begin{array}{l}U\end{array}$

$\begin{array}{l}\displaystyle U = {1\over M} \sum_{i=1}^M w_i(\overline y_i^* - y^*)\end{array}$

where $\begin{array}{l}w_i\end{array}$ represent how close a groundtruth map $\begin{array}{l}y_i\end{array}$ is from $\begin{array}{l}x^*\end{array}$ : $\begin{array}{l}w_i = e^{{1\over b}{\overrightarrow {h_i}^T}{\overrightarrow {h_{x^*}}}} / e^{{1\over b}{\overrightarrow {h_x}^T}{\overrightarrow {h_{x^*}}}} = e^{{1\over b}{（\overrightarrow {h_i}^T}{\overrightarrow {h_{x^*}}-1）}}\end{array}$

3. Implementation

3.1 Datasets

Layer Ensembles tests its potentials both on binary/multi-class and 2D/3D images. For the task binary class segmentation on 2D images, LE has used Breast Cancer Digital Repository (BCDR) dataset as both training and test datasets to detect the breast mass [10]. M&Ms challenge (MnM) datasets are selected to test the ability of LE on multi-class segmentation on 3D images [11]. Same as the test for binary class segmentation for 2D images, MnM is also used as training and test datasets for cardiac segmentation. Images in the MnM datasets contain Left Ventricle (LV), MYOcardium (MYO) and Right Ventricle (RV) heart structures.

While Layer Ensembles use the same dataset for both traning and test in one task, Efficient Bayesian method focuses on the performance when the data is in-domain or out-of-domain. ADAC dataset is used as the training data in Efficient Bayesian for 2D cardiac segmentation and also as the test data to test its in-domain performance [12]. For the out-of-domain performance, Efficient Bayesian uses MnM as the test data.

Two tasks have been done in CRISP: cardiac segmentation and pulmonary tuberculosis detection. Similar to the task in Layer Ensembles, the potential of the CRISP method is further tested in different cardiac structures. CRISP uses CAMUS dataset as both training and test data to do the 2D cardiac segmentation for Left Ventricle (LV), while CAMUS and HMC-QU are selected as training and test datasets in the segmentation of MYOcardium (MYO) [13][14]. Compared to the in-domain test in cardiac segmentation, the task pulmonary tuberculosis detection can be considered as an out-of-domain performance test. The dataset Shenzhen is chosen as the training dataset and JSRT is its test dateset [15][16].

3.2 Evaluation

Generally, the uncertainty estimation methods are evaluated in the following aspects: segmentation performance, confidence calibration and uncertainty estimation. This blog will mainly focus on the assessment on uncertainty estimation. The evaluation can be divided into quantitative and qualitative assessments. As shown in Table 1, all the three methods use uncertainty map as the qualitative evaluation method. Both Layer Ensembles and CRISP quantitaively evaluate the ability to estimate uncertainty through some common uncertainty metrics. The pixel-wise uncertainty metrics are summed up and compared among different methods. It is worth noticing that LE also use the specific evaluation method Area Under Layer Agreement curve (AULA), which shows the agreement between the adjacent layer outputs. The higher the value of AULA is, the lower uncertainty the result has. This provides an image-level uncertainty metric.

	Quantitative	Qualitative
Layer Ensembles	common uncertainty metrics (e.g. variance, entropy, MI) AULA	Uncertainty map
Efficient Bayesian		Uncertainty map
CRISP	common uncertainty metrics (correlation, MI)	Uncertainty map

Table 1. Uncertainty estimation evaluation for different methods

4. Results

Since Layer Ensembles is inspired and improved from Deep Ensembles, LE is compared with DE. It can be concluded from Figure 7, that LE and DE have similar performance on detecting segmentations with high uncertainty in 2D application, while LE achieves better AULA. In more complicated applications (3D case), LE can detect poor quality segmentation part much faster than DE and achieves better AULA at the same time. Besides, the overfitting performance in DE can be obviously seen in the uncertainty map in Figure 8.

Figure 7. Segmentation quality control for DE and LE. The following are averaged indicators for: random flagging (dashed black); remaining 5% of poor segmentations (dotted grey); and ideal line (grey shaded area). [1]

Figure 8: Examples of visual uncertainty heatmaps based on variance for high uncertainty areas (red arrows) using LE (top) and DE (bottom) for breast mass and cardiac structure segmentation. Black and green contours correspond to ground truth. [1]

As mentioned before, Efficient Bayesian only uses uncertainty map to show the results and comparison. In section (a) in Figure 9, which is a successful segmentation case, all methods succeed to highlight the uncertainty parts. However, in failed cases in (b)(c)(d), only multi-modal Efficient Bayesian detect relatively comprehensive parts with high uncertainty in all cases. Other methods such as MC-Dropout, Deep Ensembles or single-modal Efficient Bayesian failed in at least one cases.

Figure 9. Predictions (Pred.) and estimated uncertainty maps (Uncert.) on a successful case (a) and three partially failed cases (b–d). [2]

From the quantitative results in Table 2, it can be concluded that the CRISP method performs the best among all the other methods, including Entropy, Edge and ConfidNet. Futhermore, it can be found in the uncertainty map in Figure 10 that CRISP can also detect regions with high uncertainty despite of large errors, while the other SOTA methods failed.

Table 2. Quantitative results in CRISP [3]

Figure 10. From top to bottom: raw images, corresponding error maps, uncertainty estimation of SOTA methods and CRISP uncertainty. White indicates erroneous pixels in the error maps [row 2] and high uncertainty in the uncertainty maps [rows 3 and 4]. [3]

5. Discussion

Layer Ensembles regards the network with different depths as different sets to train. Therefore, compared with the previous Deep Ensembles, LE contains only one network and single forward pass is required during the training. This highly increases the effciency of the uncertainty estimation and mitigates the training complexity at the same time. In the Layer Ensembles, the concept of AULA has been also proposed as a image-level uncertainty metric, which can somehow mitigate the independency among pixels in pixel-wise uncertainty metrics. However, the network is necessary to be modified in order to apply LE.The method still needs to be tested on its out-of-domain performance to check it applicability on new clinical cases. It also have to be tested on other networks to proves its architecture agnostic.

Efficient Bayesian contains a wider range of results due to its multi-modal sampling, which makes its result more reliable. Since Efficient Bayesian improves the training procedure, the network doesn't need to be modified. Nevertheless, the Efficient Bayesian lacks quantitative evaluation to prove its capability. The qualitative result cannot precisely show the differences between Efficient Bayesian and other SOTA methods. It also lacks test on 3D segmentations.

CRISP considers the uncertainty estimation on a higher degree through combining the latent space from images to the latent space from segmentations. Also, information from other samples are ultilised as latent anatomical prior distribution when calculating the uncertainty of one single image. This makes the uncertainty estimation robust even when there are large errors in the segmentations. However, due to its intrinsic complexity, it requires efforts to embed CRISP into the current architecture. Besides, it remains unknown whether the CRISP method can be applied to 3D segmentations, since its methodology only invovles 2D calculation.

These three methods correspond to the different directions in the current researches of uncertainty estimation (Figure 11). LE and Efficient Bayesian belong to Bayesian methods. The LE improves the uncertainty estimation mainly on the structure, while Efficient Bayesian focuses more on the training improvements. CRISP is a completely new method, which is categorized to non-Bayesian method.

Figure 11. Relationship of the three methods introduced in this blog

References

Kushibar, K., Campello, V., Moras, L., Linardos, A., Radeva, P., & Lekadir, K.. (2022). Layer Ensembles: A Single-Pass Uncertainty Estimation in Deep Learning for Segmentation. https://doi.org/10.48550/arXiv.2203.08878
Zhao, Y., Yang, C., Schweidtmann, A., Tao, Q. (2022). Efficient Bayesian Uncertainty Estimation for nnU-Net. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13438. Springer, Cham. https://doi.org/10.1007/978-3-031-16452-1_51
Judge, T., Bernard, O., Porumb, M., Chartsias, A., Beqiri, A., & Jodoin, P.M.. (2022). CRISP - Reliable Uncertainty Estimation for Medical Image Segmentation. https://doi.org/10.48550/arXiv.2206.07664
https://towardsdatascience.com/my-deep-learning-model-says-sorry-i-dont-know-the-answer-that-s-absolutely-ok-50ffa562cb0b
Kendall, A., & Gal, Y.. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. https://doi.org/10.48550/arXiv.1703.04977
Mehrtash, A., Wells, W.M., Tempany, C.M., Abolmaesumi, P., Kapur, T.: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging 39(12), 3868–3878 (2020)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference onMachine Learning, pp. 1050–1059. PMLR (2016)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. arXiv preprint arXiv:1612.01474 (2016)
Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U. Rajendra Acharya, Vladimir Makarenkov, & Saeid Nahavandi (2021). A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76, 243–297. https://doi.org/10.48550/arXiv.2011.06225
Moura, D.C., Lopez, M.A.G., Cunha, P., Posada, N.G.d., Pollan, R.R., Ramos, I., Loureiro, J.P., Moreira, I.C., Araujo, B.M., Fernandes, T.C.: Benchmarking datasets for breast cancer computer-aided diagnosis (CADx). In: Iberoamerican Congress on Pattern Recognition. pp. 326{333. Springer (2013)
Campello, V.M., Gkontra, P., Izquierdo, C., Martn-Isla, C., Sojoudi, A., Full, P.M., Maier-Hein, K., Zhang, Y., He, Z., Ma, J., et al.: Multi-centre, multivendor and multi-disease cardiac segmentation: the M&Ms challenge. IEEE Transactions on Medical Imaging 40(12), 3543{3554 (2021)
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multistructures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Leclerc, S., Smistad, E., Pedrosa, J., Østvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., D’hooge, J., Lovstakken, L., Bernard, O.: Deep learning for segmentation using an open large-scale dataset in 2d echocardiography. IEEE Transactions on Medical Imaging 38(9), 2198–2210 (2019)
Degerli, A., Zabihi, M., Kiranyaz, S., Hamid, T., Mazhar, R., Hamila, R., Gabbouj, M.: Early detection of myocardial infarction in low-quality echocardiography. IEEE Access 9, 34442–34453 (2021)
Jaeger, S., Candemir, S., Antani, S., W´ang, Y.X.J., Lu, P.X., Thoma, G.: Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative Imaging in Medicine and Surgery 4, 475 (2014)
Shiraishi, J., et al.: Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. American Journal of Roentgenology 174, 71–74 (2000)

Seitenhierarchie