Unsupervised Domain Adaptation in Medical Image Analysis

This blog post introduces and discusses the topic of Unsupervised Domain Adaptation in Medical Image Analysis. The blog post begins with an introduction to the topic along with some concepts, answering questions such as what is unsupervised domain adaptation (UDA) and why is it relevant in the field of medical image analysis. This is followed by a deep-dive on three different papers, each addressing a different purpose in UDA. The blog post concludes with a review and discussion of the paper, and conclusions.

Introduction

In recent years, supervised deep learning methods have shown high performance in different tasks. However, its success relies on two pillars: (i) large volumes of data and (ii) that these are independent and identically distributed (i.i.d.) [1,2]. The first of these pillars represents a challenge especially in the field of medical imaging, due to the difficulty in obtaining data and in labeling them, for different reasons, such as cost and expertise required [3,4,5]. On the other hand, when a high performing model is deployed or tested on a different but related dataset to the training one, it is very likely to observe a decrease in its performance [1,2]. This is one of the cases where the second pillar is not met. Specifically, these data is known as out-of-distribution (OOD), and the decrease in performance is due to a lack of generalization of the model.

Given the latter scenario, the question arises as how to leverage the previously acquired knowledge from solving one problem and transfer it to solve a different but related problem. This is the goal of transfer learning, which is a machine learning research problem. The transfer can be done at the domain level, the task level, or both. A domain $\begin{array}{l}D\end{array}$ is composed of a feature space $\begin{array}{l}X\end{array}$ and a marginal probability distribution $\begin{array}{l}P(X)\end{array}$ . Having a specific domain, a task consists in turn of a label space $\begin{array}{l}Y\end{array}$ and a predictive objective function $\begin{array}{l}f:X\rightarrow Y\end{array}$ [6,7]. In our case, we will focus on domain-level transfer, which is known as domain adaptation.

In order to proceed, the domains must be differentiated. The domain from which the transfer is made is called the source domain and the domain to which the transfer is made is called the target domain. Depending on the knowledge of the target domain, domain adaptation can be divided into fine-tuning and unsupervised domain adaptation (UDA), as Figure 1 shows. Moreover, when no information about the target domain is known, one is under the paradigm of domain generalization, whose aim is that a model has the ability to generalize up to the point that it obtains good performance in (certain) domains previously unseen [8].

The field of medical imaging is no stranger to the phenomenon of decreased performance when the model is tested with different but related data, despite common anatomical structures [3]. This is exemplified in Figure 2. In the figure, magnetic resonance imaging (MRI) scans are shown, each differing in the contrast employed and the scanner used to take it. These differences are reflected in the intensity distribution. So, if a simple model as thresholding is "trained" using images of type (a), when testing in images of type (c), bad results will be obtained. This difference in domains is known as domain shift, having four types of it: covariate shift, label shift, conditional shift, and concept shift (refer to section concepts). Our focus is on the covariate shift, that is produced, for example, by differences in imaging modalities, in scanners and scanning protocols, in demographic properties, in medical centers, among others [3,4,5]. Moreover, UDA becomes even more relevant in the medical imaging field due to the problem of manual labeling.

Figure 1. Different learning paradigms according to the knowledge of the target domain. Source: [8].

Figure 2. Magnetic resonance imaging (MRI) scans using different scanners and contrast types (row 1), along with their intensity distribution (row 2). The first two images (columns a, b) correspond to MRIs with T1 contrast, while the last two (columns c, d) to MRIs with T2 contrast. Images of the same contrast come from different scanners. Source: [9].

Concepts

Domain shift

According to [2,10], there are four types of domain shift, being the first three the most common ones:

Covariate shift: $\begin{array}{l}p_s(x) \neq p_t(x)\end{array}$ . It is the most studied type of domain shift. As an example, an image from MNIST and one from MNIST-10.
Label or prior shift: $\begin{array}{l}p_s(y) \neq p_t(y)\end{array}$ . It indicates that the sampling proportions of the classes involved are different between domains.
Conditional shift: $\begin{array}{l}p_s(x|y) \neq p_t(x|y)\end{array}$ . It is a more realistic setting than the covariate shift setting alone, but the estimation $\begin{array}{l}p_t(x|y)\end{array}$ without having $\begin{array}{l}p_t(y)\end{array}$ is ill-posed in the case of UDA [2,11].
Concept shift: $\begin{array}{l}p_s(y|x) \neq p_t(y|x)\end{array}$ . It is related to a change in the meaning of labels. For example, a tomato can be considered both a vegetable and a fruit depending on the country.

Methods

Here, the different methodological lines in UDA are introduced. The information presented is based on [1,2]. Although both surveys slightly differ in how these lines are classified, they are mainly the same. It is recommended to refer to the sources for a more detailed description.

In recent years, different research/methodological lines have emerged in UDA. These include aligning the source domain and target domain distributions, mapping between domains, separating normalization statistics, designing ensemble-base methods, or focusing on making the model target discriminative by moving the decision boundary to low density regions. There are even authors who have explored combinations of these.

Domain-invariant feature learning. Most recent domain adaptation methods align source and target domains by creating a domain-invariant feature representation space by means of deep neuronal networks. Within this group of methodologies are divergence minimization (or statistical divergence alignment), reconstruction, and adversarial training.
(Generative) domain mapping. Inter-domain mapping is typically created adversarially and at the pixel level. Thus representing an alternative to domain-invariant feature representation. This mapping can be achieved, for example, with a conditional GAN.
Normalization statistics. These methods have their origin in normalization statistics such as batch normalization (BN). According to [3], the mean and variance of BN statistics inherit domain knowledge. Hence, typically these approaches employ per-domain batch normalization.
Ensemble-based methods. Several ensemble-based methods have been developed for domain adaptation. The ensemble can be used either to guide learning with its prediction (self-ensembling), or to measure the prediction confidence for pseudo-labeling the target data (self-training; or pseudo-labeling). Both, self-ensembling and self-training are rooted in semi-supervised learning.
Target discriminative (or low density target boundary) methods. These methods rely on the cluster assumption, which has led to success in semi-supervised learning algorithms. This assumption states that the data is distributed in separate clusters, and the samples from each cluster have a common label. Taking this for granted, the decision boundaries should be in regions of low density. Hence, these methods seek to move decision boundaries to regions of lower density, and usually are trained adversarially.

In Figure 3, the classification of methodological lines proposed by [2] is provided.

Figure 3. Taxonomy of unsupervised domain adaptation by methodological line. Own authory based on [2].

Methodologies

Three different methodologies are presented below, each addressing a specific scenario. The first methodology focuses on UDA for the case of scarce target domain data. The second methodology addresses UDA in the presence of noisy labels in the source domain. The third, and last methodology, shows how to exploit the geometric shape of the anatomical structures to be segmented.

Data Efficient Unsupervised Domain Adaptation for Cross-Modality Image Segmentation by Ouyang et al. (2019) [3]

Architecture and concepts

Figure 4. Architecture proposed in [3]. VAE with two domain-specific encoders and decoders, extended from [12], and a segmentation network operating on the VAE’s latent space Z, which will learn to be domain invariant.

The proposed model is based on two concepts: prior matching and domain adversarial learning. Prior matching is defined as the matching of the priors $\begin{array}{l}p(Z^S) \textnormal{ and } p(Z^T)\end{array}$ by enforcing both to approximate to a $\begin{array}{l}N(0,I)\end{array}$ . This prior regularization of the feature space acts as a constraint on the matching of the data distributions. Assuming that the feature space $\begin{array}{l}Z\end{array}$ can be projected from the source domain and the target domain, and exploiting the variational autoencoder (VAE) architecture, one has that the KL-divergence can be estimated analytically, which is data-efficient. For its part, domain adversarial learning consists of adaptively learning a measure of divergence between the domains. This methodology arose thanks to the advances in generative adversarial networks (GANs).

In detail, a single image from either the source domain or the target domain is sent to its corresponding domain-specific encoder: $\begin{array}{l}E^Z\circ E^S\end{array}$ (source), or $\begin{array}{l}E^Z\circ E^T\end{array}$ (target). The encoder predicts the posterior of the latent feature $\begin{array}{l}Z\end{array}$ in $\begin{array}{l}\mathcal{Z}\end{array}$ . The segmentation network ( $\begin{array}{l}Seg\end{array}$ ) takes this as input. That is, that the segmentation operates over the latent space. During training, the feature map is sent to the decoders $\begin{array}{l}D^S\circ D^Z\end{array}$ and $\begin{array}{l}D^T\circ D^Z\end{array}$ simultaneously to reconstruct images in both domains. Then, the domain classifier network ( $\begin{array}{l}Disc\end{array}$ ) differentiates whether its input comes from the original source image set or is generated.

Methodology

Figure 5. Loss function proposed in [3].

The methodology consists of two phases: (i) supervised training in the source domain and (ii) domain adaptation. The first phase is modeled by the loss functions highlighted in light blue in Figure 5. Specifically, $\begin{array}{l}L_{rec}^S\end{array}$ and $\begin{array}{l}L_{kl}^S\end{array}$ correspond to the VAE loss function for the source domain. $\begin{array}{l}L_{seg}\end{array}$ is a mixture between the Dice coefficient and the cross-entropy. This in order to overcome class imbalance in relation to small segment labels and large backgrounds. $\begin{array}{l}L_{adv}^S\end{array}$ is responsible for pretraining $\begin{array}{l}Disc\end{array}$ to classify whether its input comes from the source training set or is generated. In this phase, $\begin{array}{l}E^S, E^Z, D^Z, D^S, Disc, \textnormal{ and } Seg\end{array}$ networks are trained.

In the second phase, the remaining loss functions are added. In Figure 5, the contribution of adding each of them is indicated. To prevent $\begin{array}{l}E^Z\end{array}$ and $\begin{array}{l}D^Z\end{array}$ from overfitting on small $\begin{array}{l}\{x^T\}\end{array}$ , only $\begin{array}{l}E^T\end{array}$ and $\begin{array}{l}D^T\end{array}$ are updated. On the other hand, to enhance $\begin{array}{l}Disc\end{array}$ , $\begin{array}{l}\{x^{TS}\}\end{array}$ (target domain images translated to source domain) are included for its training. In addition, cycle-consistency is applied in the direction $\begin{array}{l}D^T\rightarrow D^S\rightarrow D^T\end{array}$ , represented by $\begin{array}{l}L_{cyc}^T\end{array}$ . Following a similar idea, $\begin{array}{l}Seg\end{array}$ is fed with latent space vectors that come from source domain images translated into the target domain and, then, sent through the target encoders. This last step is represented by $\begin{array}{l}L_{cyc}^{task}\end{array}$

Results

Dataset: MM-WHS 2017 for heart segmentation [13]. Source domain: magnetic resonance imaging (MRI), to target domain: computed tomography (CT).

To simulate the target data scarcity scenario, a scan is randomly sampled from the target training dataset to train the proposed model. The results are the averages of the model trained six times on different randomly chosen target scans, thus avoiding biases. Figures 6 and 7 present the quantitative and qualitative results, respectively. The metrics considered to measure performance are the Dice coefficient and the ASSD. The proposed model is compared with the unadapted (baseline; without adaptation on target domain), the oracle (upper bound; fine-tuning on target domain), Pnp-AdaNet and SIFA models. It is observed that the proposed model, compared to the baseline, leads to an increase of the average Dice score by 52.15%, from 20.03% to 72.18%. Clearly, the supervised model without adaptation failed on the target domain images. Moreover, compared to the Pnp-AdaNet and SIFA models, the proposed model outperforms them in the target domain data scarcity scenario. Interestingly, it also provides results close to those of SIFA-16 and Pnp-AdaNet-16, but using only 1/16 of the data. Visually, the results are kind of similar to those of the ground truth.

Figure 6. Quantitative results for heart segmentation; mean(std) [3].

Figure 7. Qualitative results for heart segmentation [3].

S-CUDA: Self-cleansing unsupervised domain adaptation for medical image segmentation by Liu et al. (2021) [4]

Architecture and concepts

Figure 8. Framework proposed in [4]. Mainly three parts: (i) identify high-confidence clean and noisy data using loss values; (ii) cross-train the peer networks with clean data; (iii) clean the noisy data using predicted values.

The proposed model (S-CUDA) is mainly based on two concepts: self-training, as part of ensemble-based UDA, and noisy supervised learning. The architecture has two networks N1 and N2, which play the role of two experts. Each of the networks has different learning capabilities, translating into different architectures. The networks are composed of a segmentation network ( $\begin{array}{l}Si; i=1,2\end{array}$ ) and a discriminator ( $\begin{array}{l}Di; i=1,2\end{array}$ ). The discriminator is in charge of differentiating whether the segmentation map is associated with an image coming from the source domain or from the target domain, thus guiding the segmentation network to focus on local structure similarity. It is important to mention that only N1 is used for testing. Besides, pseudo-labels are applied to noisy and unlabeled data in S-CUDA.

Looking at Figure 8 in detail, it can be seen that the learning flow is encompassed in three parts: (i) identify the high-confidence clean and noisy data using loss values, (ii) cross-train the peer networks with clean data, and (iii) clean the noisy data using pseudo-labels. Data with high loss values are associated with noisy data; while those with low values, with clean data. The samples considered as clean are learned by cross-train, which seeks to avoid possible accumulation errors. On the other hand, noisy data is considered to be of high-confidence if both networks coincide in their identification. After their identification, they go through a cleaning process and are recycled in the next learning process.

Methodology

Figure 9. Loss function proposed in [4].

The proposed loss function is formed by the segmentation loss function ( $\begin{array}{l}L_{seg}\end{array}$ ) and the adversarial loss function ( $\begin{array}{l}L_{adv}\end{array}$ ). $\begin{array}{l}L_{seg}\end{array}$ is a mixture between the loss functions for data considered clean ( $\begin{array}{l}\omega=0\end{array}$ ) and for data considered noisy ( $\begin{array}{l}\omega=1\end{array}$ ). As mentioned before, the models are cross-trained with the high-confidence clean data, which means that the networks exchange the data to update their parameters. This is because it is possible that the data identified as clean contain noise and, thus, is wanted to avoid accumulating errors. High-confidence noisy data goes through a co-cleaning process with a self-training strategy. $\begin{array}{l}L_{noise}\end{array}$ is a weighting between $\begin{array}{l}L_{reweight}\end{array}$ and $\begin{array}{l}L_{pseudo}\end{array}$ , as shown in Figure 9. Both losses are considered together to alleviate the learning trap problem (inherent to self-training). The first loss, $\begin{array}{l}L_{reweight}\end{array}$ , is associated with the original labels; while the second loss, $\begin{array}{l}L_{pseudo}\end{array}$ , with the pseudo-labels. Moreover, $\begin{array}{l}L_{reweight}\end{array}$ is similar to $\begin{array}{l}L_{clean}\end{array}$ , but instead of using labels $\begin{array}{l}y_i\end{array}$ , it uses their mapping $\begin{array}{l}B(y_i)\end{array}$ , where $\begin{array}{l}B\end{array}$ denotes the boundary distance map. As for $\begin{array}{l}L_{adv}\end{array}$ , an entropy map $\begin{array}{l}F(X)\end{array}$ is used to maximize the confidence of the prediction. The map $\begin{array}{l}F(X)\end{array}$ is relevant since the pseudo-labels for target domain data may be very noisy.

Results

Datasets:

REFUGE & Drishti-GS (DGS) for optic disc and optic cup segmentation [14,15]. Source domain: REFUGE training set, to target domain: REFUGE validation set and DGS full dataset.
SCGM (multi-vendor multi-center) dataset for spinal cord gray matter segmentation [16]. Source domain: centers 1-3; to target domain: center 4.

Although the authors performed extensive tests, only the main results are presented. For the experiments, noise was injected into the source domain labels. This noise can be characterized by its type (dilation, erosion, and elastic deformation), its level (low and high; measure w.r.t. the Dice coefficient), and its corruption ratio. The results were analyzed with respect to the corruption ration, and only the Dice coefficient was considered as performance metric.

Figures 10 and 11 present the quantitative and qualitative results for optic disc (OD) and optic cup (OC) segmentation. For clean annotated data, S-CUDA achieves a Dice coefficient of 95.3% for OD and 89.4% for OC. In addition, S-CUDA generates more precisely defined boundaries. At high noise levels, as the corruption ratio increases, S-CUDA maintains a stable performance, while in the compared models the performance decreases. This is especially noticeable without pretraining or when using DGS as target domain, being more drastic for the latter. On the other hand, at low noise levels, the corruption ratio does not have a high impact on the model performance when pretraining on ImageNet is applied. As for the SCGM segmentation, the quantitative and qualitative results are presented in Figures 12 and 13. At high noise levels, as the corruption ratio increases, the test performance gradually decreases, similar to what is observed for the previous dataset. Furthermore, when using the generated pseudo-labels directly (PL), about 1–2% reduction in the Dice coefficient is seen with respect to DLaST.

Figure 10. Quantitative results for OD/OC segmentation, with and without pretraining on ImageNet , format (Dice_disc, Dice_cup) [4].

Figure 12. Quantitative results for SCGM segmentation under high noisy labeled, with and without pretraining on ImageNet, format (Dice_GM, Dice_WM) [4].

Figure 11. Qualitative results for OD and OC segmentation with different noise levels at label corruption ratio 0.5 [4].

Figure 13. Qualitative results for SCGM segmentation with high-level noise at label corruption ratio of 0.1, 0.5, 0.9 [4].

Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training by Xie et al. (2022) [5]

Architecture and concepts

Figure 14. Framework proposed by [5]. It consists of one shared anatomy encoder, two modality-specific encoders, dual-task module (DM), and one share generator.

The proposed model (DLaST) is based on two concepts: disentanglement learning and self-training. Medical images are complex mixtures of anatomical and modality factors. Under that, disentanglement learning seeks to decouple an image into two components, (1) modality-invariant structural information and (2) domain-specific texture pattern. For its part, self-training aims at finding high-confidence pseudo-labels to fine-tune a trained segmentation model. Its relevance in UDA for semantic segmentation comes from the fact that self-training provides supervision information in the target domain.

As seen in Figure 14, DLaST can be decomposed in three main parts, represented by (a), (b), and (c). In (a), the input (image) is disentangled into domain-specific modality and domain-invariant anatomy, and the combination of the two allows translating $\begin{array}{l}x_s\end{array}$ in the source domain to $\begin{array}{l}x_{s2t}\end{array}$ in the target domain. In (b), it is perceived that the anatomy feature is used for semantic segmentation and for the signed distance function (SDF) map prediction. Along with, in (c) it can be seen that by making use of disentanglement, DLaST further allows transferring the label from the source domain to the target domain.

Methodology

Figure 15. Loss function proposed in [5].

Unlike the other two approaches, here the loss function will be analyzed by training stage, in accordance to the training process proposed by the authors. The training process is divided into three sequential stages, each with different training duration. The first stage focuses on the training of block (a) shown in Figure 15. Its training loss is represented by $\begin{array}{l}L_{IT}\end{array}$ , which is composed of the reconstruction loss function ( $\begin{array}{l}L_{rec}\end{array}$ ), the adversarial loss function ( $\begin{array}{l}L_{adv}\end{array}$ ), and the zero-loss function ( $\begin{array}{l}L_0\end{array}$ ). The latter ensures that domain-specific encoders do not capture information from other domains.

For the second and third stages, $\begin{array}{l}L_{DT}\end{array}$ is employed. This loss is composed of the segmentation loss and the SDF regression map loss, and is used to optimize $\begin{array}{l}E_a\end{array}$ and $\begin{array}{l}DM\end{array}$ . Before continuing, it is important to define SDF. The signed distance function is a function that takes positive values outside the boundary of an object, negative values inside its boundary, and zero values exactly at its boundary. As it approaches the boundary, its values tend to zero, and vice-versa. In the case of DLaST, the SDF regression map encodes the geometric shape of the target object class. In addition, adversarial training is used to enforce consistency in the segmentation output space and the SDF output space between the two modalities. The adversarial losses associated to these tasks are considered in both stages. Only for the third stage an extra loss $\begin{array}{l}L_{SSL}\end{array}$ is included. This loss is related to the pseudo-labels in the target domain.

Results

Datasets:

MM-WHS 2017 for heart segmentation [13]. Source domain and target domain: bidirectional imaging modalities (MRI ↔ CT).
CHAOS dataset for abdominal organ segmentation [17]. Source domain and target domain: bidirectional imaging modalities (MRI ↔ CT).
BraTS challenge 2018 for brain tumor segmentation [18]. Source domain and target domain: bidirectional MRI contrast (FLAIR ↔ T2).

DLaST is tested on three different data sets. The metrics considered to measure performance are the Dice and Jaccard coefficients. Figures 16 and 17 present the quantitative and qualitative results, respectively, for the heart segmentation dataset. The obtained results show that DLaST is the best performing model. The Dice coefficient gap to the supervised model is reduced to 7.73%, in the CT to MRI direction, while to 7.27% in the MRI to CT direction. Furthermore, when the authors statistically tested whether there was an improvement of DLaST versus SymDA (second best performing model), the improvement was found to be significant at the 5% level in both directions of adaptation.

Regarding the results for the abdominal organ segmentation dataset, the quantitative and qualitative results are presented in Figures 18 and 19, respectively. Again, DLaST is the model with the best performance, being its results close to those of supervised learning. An interesting observation is that all the models obtained better results in abdominal organ segmentation in comparison to those in cardiac segmentation. This may be due to the fact that abdominal organs are well separated and that there are larger variations between CT and MRI in cardiac data. Lastly, for the brain tumor segmentation, Figures 20 and 21 present the quantitative and qualitative results. Both qualitative and quantitative, DLaST showed its superiority in the two adaptation directions. The segmentation map obtained by DLaST can better keep the shape and boundary of the tumor, as can be corroborated in Figure 21.

Figure 16. Quantitative results CT to MRI and vice-versa for heart segmentation [5].

Figure 18. Quantitative results CT to MRI and vice-versa for abdominal organ segment. [5].

Figure 20. Quantitative results for brain tumor segmentation [5].

Figure 17. Qualitative results for cardiac segmentation in MRI (1,2) and CT (3,4) [5].

Figure 19. Qualitative results for abdominal organ segment. in MRI (1,2) and CT (3,4) [5].

Figure 21. Qualitative results for brain tumor segmentation in FLAIR (1,2) and T2 (3,4) [5].

Review and discussion

Since the three methodologies have different application scenarios and, thus, assumptions, a direct comparison between them is not possible. Therefore, the following review focuses on the analysis of each of the methodologies and the comparison of general aspects.

VAE-based UDA. For this method it is important to mention that it seems to be easy to implement and is detailed enough to be easy to understand. Also, for the proposed use case, it shows quite promising results. This is important given the difficulty of obtaining data in the field of medical imaging. Nevertheless, this type of model assumes the existence of a common feature latent space for both domains, which cannot hold. When analyzing the defined posterior distribution in detail, the usage of a diagonal covariance matrix implies a cost in the expressiveness of the model.

S-CUDA. The proposed methodology is well detailed making it easy to understand. It makes use of the ensemble principle in what the authors call peer-review, which is a natural and suitable approach. For this use case, robust and high performance results are shown. It is worth noting the adaptation of other models to allow for both domain adaptation and noisy labels, and the completeness of the experiments carried out. However, implementing and deploying the model could be challenging, especially because of the number and type of hyperparameters. Furthermore, the semantic segmentation in all cases was limited to two classes (excluding background), and it shows better performance on unambiguous anatomical structures. This is why it would be interesting to see how the segmentation model performs in harder/ambiguous anatomical structures, such as retinal vessels. Focusing a little more on the use case, the aim is that the model is robust to noise in the source domain labels. For testing purposes, the authors generated noise at the boundaries of the classes/organs to be segmented. This type of noise is in line with the defined loss function. So while this is probably the most common type of label noise in segmentation, it would be interesting to see how the model performs, for example, on segmentation labels generated by deep learning models. Furthermore, being aware that the following was not the goal of the study, the extension of the experiments to poisoning attacks on the labels could lead to promising results.

DLaST. The proposed methodology uses disentanglement learning, which takes advantage of the shape or geometric representation of the anatomical structures involved. This is a great advantage of the method, which is reflected in the obtained results. However, some parts of the methodology lack details, which can make its implementation/replication difficult. Especially, this happens for the part related to self-training. On the other hand, the training process is not backpropagated to the modality encoders during phases 2 and 3. Since in these phases the main objective is to make the shared encoder correctly capture the anatomical shape, not allowing the gradients to flow into the modality encoders could prevent them to further tune themselves. Hence, it might be nice to know if unfreezing the modality encoders and letting them to learn with pretty low learning rates gives better results, or on the contrary, the performance deteriorates. As final comment, given the model's ability to learn the geometric shape of the anatomical structures, it would be interesting to extend the experimentation to the case of label noise in the source domain. Similarly, since the authors comment that better results are observed in the segmentation of well-separated anatomical structures, it could be important to know the level of influence of this factor on the segmentation capacity of the model.

General aspects

The following table shows the evaluation of different general aspects. "X" represents the presence of the item, "?" indicates that there is not enough information to decide, and "-", the absence of the item. Some of the cells contain a comment instead, meaning that the item is partially present and further providing details.

Comparison Item	VAE-based UDA [3]	S-CUDA [4]	DLaST [5]
Ablation study	X	X	X
Multi-dataset testing	-	X	X
Real-life testing	-	-	-
Multi-type covariate shift addressed in testing	-	X	?
Multiple performance metrics	X	-	Yes, but related
CE + Dice as segment. loss	X	X	X
Source code	-	X	Not yet

Conclusions

Unsupervised domain adaptation is important in the field of medical image analysis, due to the difficulty of obtaining data and its labels. When there is scarce data in the target domain, the model needs to be data-efficient. This efficiency can be obtained, for example, by restricting the prior distributions over the latent vector space. This idea is the core of UDA based-VAE. The model shows to be superior to simple encoder-decoder networks, since by imposing constraints on the prior distribution of the domains, they can be better matched. In addition, sampling from the posteriors with noise increases the data, which improves the robustness of the model.

On the other hand, due to the difficulty of obtaining labels, it is possible that these are generated by labelers who are not necessarily medical specialists or highly experienced in the field. The labels may even come from deep learning models. Especially for semantic segmentation, most of the label noise occurs at the boundary of the classes to be segmented. Therefore, learning from noisy labels is a very desirable capability. Ensemble strategies, such as peer-networks, can help identify noisy data and prevent accumulating errors. Moreover, the identified high-confidence noisy data can be used for training after relabeling. The importance of not discarding this data relies in the diversity and useful information that it can provide to learn a robust model.

Now, medical images can be seen as complex mixtures of anatomical and modality factors. The concept of disentanglement learning allows decoupling an image into (1) the modality-invariant structural information and (2) the domain-specific texture pattern. In addition, the self-training strategy, which leverages pseudo-labels, and adversarial learning in the output space help to improve segmentation performance in the target domain by fine-tuning the anatomical encoder.

Finally, it is important to mention that when deploying models trained under domain adaptation in a real scenario, it is quite likely that the model will encounter ODD data, and not just once, but multiple times. This implies that retraining/re-adapting for each of these unseen domains is not a viable option in the long term. Therefore, the ideal would be to have a domain-agnostic model, which can be applied to (certain) unseen domains. Eventually, this ideal leads to the domain generalization paradigm.

References

[1] Wilson G, Cook D. A Survey of Unsupervised Deep Domain Adaptation. ACM Transactions on Intelligent Systems and Technology. 2020 07;11:1-46.

[2] Liu X, Yoo C, Xing F, Oh H, Fakhri G, Kang JW, et al. Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives. APSIPA Transactions on Signal and Information Processing. 2022 05.

[3] Ouyang C, Kamnitsas K, Biffi C, Duan J, Rueckert D. Data Efficient Unsupervised Domain Adaptation For Cross-modality Image Segmentation. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, et al., editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. Cham: Springer International Publishing; 2019. p. 669-77.

[4] Liu L, Zhang Z, Li S, Ma K, Zheng Y. S-CUDA: Self-cleansing unsupervised domain adaptation for medical image segmentation. Medical Image Analysis. 2021;74:102214. Available from: https://www.sciencedirect.com/science/article/pii/S1361841521002590.

[5] Xie Q, Li Y, He N, Ning M, Ma K, Wang G, et al. Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training. IEEE Transactions on Medical Imaging. 2022:1-1.

[6] Pan SJ, Yang Q. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering. 2010;22(10):1345-59.

[7] Wikipedia contributors. Transfer learning — Wikipedia, The Free Encyclopedia; 2023. Accessed: 2023-01-18. https://en.wikipedia.org/wiki/Transfer_learning.

[8] Lee H. Domain Adaptation. Accessed: 2023-01-18. https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/da_v6.pdf.

[9] Chen C. Towards Robust AI: Advanced Data Augmentation [unpublished lecture notes]. AI in Medicine I, lecture given 2022-11-14.

[10] Kouw WM, Loog M. An introduction to domain adaptation and transfer learning. arXiv; 2018. Available from: https://arxiv.org/abs/1812.11806.

[11] Zhang K, Schölkopf B, Muandet K, Wang Z. Domain Adaptation under Target and Conditional Shift. In: Dasgupta S, McAllester D, editors. Proceedings of the 30th International Conference on Machine Learning. vol. 28 of Proceedings of Machine Learning Research. Atlanta, Georgia, USA: PMLR; 2013. p. 819-27. Available from: https://proceedings.mlr.press/v28/zhang13d.htm.

[12] Benaim S, Wolf L. One-Shot Unsupervised Cross Domain Translation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.; 2018. p. 2108–2118.

Datasets

[13] Zhuang X, Shen J. Multi-scale patch and multi-modality atlases for whole heart segmentation of MRI. Medical Image Analysis. 2016;31:77-87. Available from: https://www.sciencedirect.com/science/article/pii/S1361841516000219.

[14] Orlando JI, Fu H, Barbosa Breda J, van Keer K, Bathula DR, Diaz-Pinto A, et al. REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical Image Analysis. 2020;59:101570. Available from: https://www.sciencedirect.com/science/article/pii/S1361841519301100.

[15] Sivaswamy J, Krishnadas SR, Datt Joshi G, Jain M, Syed Tabish AU. Drishti-GS: Retinal image dataset for optic nerve head(ONH) segmentation. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI); 2014. p. 53-6.

[16] Prados F, Ashburner J, Blaiotta C, Brosch T, Carballido-Gamio J, Cardoso MJ, et al. Spinal cord grey matter segmentation challenge. NeuroImage. 2017;152:312-29. Available from: https://www.sciencedirect.com/science/article/pii/S1053811917302185.

[17] Kavur AE, Gezer NS, Barı ̧s M, Aslan S, Conze PH, Groza V, et al. CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation. Medical Image Analysis. 2021 apr;69:101950. Available from: https://doi.org/10.1016%2Fj.media.2020.101950.

[18] Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging. 2015;34(10):1993-2024.

Repository with a collection of papers, code, and others, about domain adaptation: https://github.com/zhaoxin94/awesome-domain-adaptation

Seitenhierarchie

Unsupervised Domain Adaptation in Medical Image Analysis

Introduction

Concepts

Domain shift

Methods

Methodologies

Data Efficient Unsupervised Domain Adaptation for Cross-Modality Image Segmentation by Ouyang et al. (2019) [3]

Architecture and concepts

Methodology

Results

S-CUDA: Self-cleansing unsupervised domain adaptation for medical image segmentation by Liu et al. (2021) [4]

Architecture and concepts

Methodology

Results

Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training by Xie et al. (2022) [5]

Architecture and concepts

Methodology

Results

Review and discussion

General aspects

Conclusions

References