This blog post mainly focuses on the topic “Recent Trend on medical image segmentation”. First, I will introduce the motivation for “Recent Trend on medical image segmentation”. Secondly, I will show you the introduction of medical image segmentation and its relevant method – domain adaptation. Then, the three methodologies on domain adaptation for medical image segmentation, Attention-Enhanced Disentangled Representation Learning for domain adaptation, Domain Specific Convolution and High Frequency Reconstruction for domain adaptation and SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation. Finally, I will conclude and give my perspective on this topic.

1. Introduction

1.1 Motivation

In recent years, with the development of machine learning, deep neural networks have achieved outstanding performance to accurately diagnose with medical image segmentation. Despite being powerful tools, medical image segmentation presents unique challenges, as these images often exhibit variability. When deploying multiple datasets from different institutions and hospitals, researchers will need help with domain shift. Domain shift arises due to the difference in medical imaging devices and diversified imaging modalities, such as MRI, CT, or X-Ray. For instance, when applied to CT images from a different hospital with varying machines, a model trained on CT images from one hospital may not exhibit optimal performance. Since these different devices lead to different data distribution. Furthermore, domain shift can seriously compromise the effectiveness of the segmentation models. Therefore, domain adaptation is introduced to tackle domain shift and be equipped with the model to train on one model and effectively perform well on another domain. Moreover, because of expensive costs and patients' privacy, medical data are challenging to collect and usually unlabeled. Hence, unsupervised domain adaptation (UDA) comes out to be applied in the medical field to remove this obstacle. For these reasons, I will introduce UDA and provide the recent methodologies.

1.2 Medical Image Segmentation

Image Segmentation is very important technique for computer vision to divide these complex images into different quantifiable units based on pixel and help reduce the complexity to analyze easily. Medical Image Segmentation is one kind of image segmentation that is applied in medical field. This technique plays a crucial role in our life. It can significantly improve diagnose and treatment. For example, by segmenting X-Ray image for chest in Figure.1, doctor can accurately identify where is infected and properties of chest, then enabling precise diagnosis and effective treatment planning. Furthermore, the trend of medical image segmentation includes unncertainty estimation, domain adaptation, multi-modal segmentation, etc. Here, the blogpost concentrates on domain adaptation to solve the domain shift for medical image sengemtation.

Figure.1. Example of one modality: X-Ray [18]

1.3 Domain Shift

Domain shift, also known as dataset shift, is one of the significant challenges for medical image segmentation. This is due to the difference between the source and target domains when the scientist collects data from different imaging devices. Then, it leads to decreasing model performance and reliable predictions. Besides, machine learning models are trained to recognize patterns in the source domain. When applied to a target domain with a different distribution, these patterns could not be valid and causes a degradation in accuracy. Furthermore, it is impractical to retrain new domain data since manual annotating is very expensive and time-consuming. Therefore, domain shift is a very crucial problem for medical image segmentation.

Below are four types of domain shifts based on [5]:

Covariate shift : input features change between the source and target domain, but the relationship between features and labels remain unchanged. For example, a model trained on MRI images from one hospital might not perform as well when applied to MRI images from a different hospital that uses different image devices.

Concept shift (data drift) : the input features remain the same between the source and target domain, but the relationship between features and labels changes, i.e., the meaning of predicted labels changes. For instance, when the system gauge which email is spam. Sending emails frequently is a clever hint of spamming behavior in the past, but it is totally different nowadays.

$\begin{array}{l}Convariate Shift : p_s(x) \not = p_t(x)\\ Concept Shift : p_s(y) \not = p_t(y)\end{array}$

Conditional shift [1] : a change in the conditional distribution between the input features and the labels when moving from the source domain to the target domain. Take a heart disease model based on patient features. However, leveraging this model in a different context with significant lifestyle and dietary generates differences compared to the original one. Factors may not contribute to heart disease risk like the setting model.
Prior shift (label shift) : literally, a shift in the prior distribution of the target domain compared to the source domain. Take a classifier as an example. Training a classifier is to predict diagnoses given symptoms, as the relative prevalence of diseases is changing over time.

$\begin{array}{l}Conditional Shift : p_s(x|y) \not = p_t(x|y)\\ Prior Shift : p_s(y|x) \not = p_t(y|x)\end{array}$

In this blog post, Covariate shift is our primary domain shift type to be addressed since they result from differences in imaging modalities and medical institutes. For example, three methods for medical image segmentation [6][7][8] bidirectionally deploy the CT images as the source domain and MRI images as the target domain. To be more specific, the covariate shift arises because the source images and target images are from different modalities. Therefore, this blogpost concentrates on these approaches to tackle the covariate shift.

1.4 Domain Adaptation - Unsupervised Domain adaptation

Domain Adaptation is a technique to tackle the domain shift. It can be also recognized as one of transfer learning. Figure 3. [4] also shows different technique about transfer learning.

Unsupervised Domain Adaptation (UDA) [2] : Labeled source domain and unlabeled target domain
Supervised domain adaptation : Labled source and target domain.
Semi-supervised domain adaptation : Lable source domain and a few samples in the target domain to be labeled.
Self-taught learning : Unlabeled source and labeled target domain
Unsupervised learning : Unlabeled source and target domain

This blog post will focus on UDA, which indicates unlabeled samples in the target domain for domain adaptation.

Figure. 3. A taxonomy of transfer learning approach

1.5 On-Trend Methods of Unsupervised Domain Adaptation on medical image analysis

Based on the survey [4] of recent advances and perspectives on medical image analysis for UDA, there are seven different skills to achieve the performance of UDA.

Feature alignment : align the domain-invariant features across domains using CNN models. A large portion of these models utilize an architecture to the Domain Adversarial Neural Network (DANN) structure.
Image alignment : align these images to identify and isolate certain features that appear in more than one image, such as Generative Adversarial Network (GAN).
Feature alignment + Image alignment : integrate above two techniques, for example, Unet-GAN
Disentangled representation : Detect and disentang the underlying factors hidden in the visible data, presenting them in representation form. Besides, A method based on self-ensembles was proposed for unsupervised domain adaptation, for instance, ADR
Ensemble Learning : Multiple models or multiple views of the data to improve domain adaptation, such as self-emsembling.
Soft label : Use probabilistic or continuous instead of discrete labels to provide more information for domain adaptation, for instance, CH2-UNET
Feature learning : Learning more generalizable and transferable features, for example, UDA-TFL

As shown in Figure.4, the dark blue text box contains three methodologies that I will present. Besides, I classify ADR as disentangled representation group because this is the core method for ADR. Furthermore, both DoCR and SMC-UDA belong to the feature alignment + image alignment group. They align the feature and image to learn domain-invariant structural information.

Figure. 4. A taxonomy of unsupervised domain adaptation (UDA) [4]

2. Methodology

2.1 Attention-Enhanced Disentangled Representation Learning for domain adaptation. [6]

Figure 5. An overview of ADR [6]

Methodology & Architecture

According to Figure 5., the essential concept for the ADR model is the channel-wise disentanglement approach, Hilbert-Schmidt independence criterion (HSIC), and an attention bias module. Besides, the ADR model is mainly composed of 3 parts: Alignment of Imaging Characteristics, Channel-wise disentanglement, and attention-bias for adversarial learning. Even in inference, they utilize an information fusion calibration (IFC) to achieve higher accuracy in predictions during inference.

First, the authors implemented coarse alignment before disentanglement, so the model can eliminate apparent domain shifts between MRI and CT, such as brightness. Besides, GAN model aims to distinguish the real image. Secondly, channel-wise disentanglement with single-path encoding promotes mutual guidance. Accordingly, it can remove the interference of domain shift on domain-invariant features. The independence and complementarity of disentangled features are limited by the Hilbert-Schmidt independence criterion (HSIC) [19]. The classifier $\begin{array}{l}c_{spf}\end{array}$ is adopted to classify domain-specific features $\begin{array}{l}f_{spf}\end{array}$ to ensure that $\begin{array}{l}f_{spf}\end{array}$ and domain-specific information are highly correlated.Thirdly, the most critical part is an attention-biased module, which makes the model pay more attention to the alignment of task-relevant regions. Channel max pooling (CMP) is employed to take essential features. The discriminator distinguishes source or target images to achieve cross-domain alignment at the output level. In inferece, the authors also conduct an information fusion calibration (IFC) strategy to utilize the inter-slice information as much as possible and decrease the prediction error. Then, they reach the cosine similarity matrix and by computing the cosine similarity of the two pixel-wise slices.

Loss Function

The above equation is the total optimization loss for the ADR method, and it can be split this loss equation into five stages. $\begin{array}{l}L^t_{adv}\end{array}$ is GAN loss , which aims to recognize whether it is an actual source image. HSIC loss, adapted for disentangled representation learning, is to understand the relation between $\begin{array}{l}f_{spf}\end{array}$ and $\begin{array}{l}f_{inv}\end{array}$ whether they are independent. Furthermore, $\begin{array}{l}L_{spf}\end{array}$ is cross entropy loss for the classifier to distinguish $\begin{array}{l}f_{spf}\end{array}$ . In addition, $\begin{array}{l}L_{seg}\end{array}$ combines cross entropy loss with dice loss. Then, it is employed to train a classifier to generate a prediction. Finally, $\begin{array}{l}L^p_{adv}\end{array}$ is GAN loss to distinguish whether the images are from source or target domain to reach cross alignment with excellent performance. To sum up, there are total optimization loss proposed for the ADR method.

Dataset & Metrics

The authors use the Multi Modality Whole Heart Segmentation (MMWHS) challenge 2017 [10] dataset that includes unpaired CT and MR volumes with ground-truth masks. They research different tasks (1) CT as source domain to MR as target domain (2) MR as source domain to CT target domain.

Figure 9. MMWHS dataset [10]

They provided two metrics to measure the performance. First approach is the Dice score [20] for pixel-wise similarity between a predicted segmentation and its corresponding ground truth. Also, the Dice score is defined as 1 when producing perfect overlap and 0 when performing empty overlap. In other words, the higher the Dice score, the better the image segmentation performance.

P is the predicted segmentation and G is corresponding ground truth

Average symmetric surface distance (ASSD) [20] is computed by the mean of all the distances from each point on the surface of predicted segmentation to the surface of the ground truth. ASSD is particularly helpful in getting the boundary location right because this metric provides a valuable measure of the overall differences in the segmented boundaries. Furthermore, an ASSD with a value of 0 points out a perfect match. That is to say, the smaller the ASSD value, the better the predicted segmentations and groundtruth match.

d(x,y) is Euclidean norm, G is groundtruth, M is predicted segmentation

Result

Quantitative Analysis

They execute bidirectional domain adaptation experiments between CT and MRI images for quantitative analysis. From the quantitative results in Table 3. below, the authors compared the supervised training and without-adaptation model. As a result, it is very clear to determine the significant domain shift between the two domains. Furthermore, the ADR method achieves the improvement in the Dice score 80.0% and the ASD being reduced to 5.5 on the MRI to CT. On the other hand, from CT to MRI, it improves 0.2% in the Dice and reduces 0.4 in ASD. On top of that, IFC has a positive impact on every result. In other words, the ADR+IFC method almost demonstrates superior effectiveness.

Table. 3. Quantitative comparison with different methods [6]

Qualitative Analysis

The researchers present a visualization of segmentation results for qualitative analysis, as shown in Figure.11 below. The results align more closely with the ground truth for the slice images in both directions, but IFC does not significantly influence visualization results.

Figure. 12. Visualization of Cardiac segmentation results [6]

Moreover, the authors also provide t-SNE visualization to evaluate the performance of features with and without domain adaptation. There are significant domain shifts between CT and MRI. That is to say, the proposed method secures exceptional results to address the problem. In conclusion, the ADR method apparently achieve great success to align in cardiac segmentation.

Figure. 13. t-SNE visualization [6]

Ablation Study

Aside from complete architecture, they conduct ablation experiments to evaluate how effective every part is. They categorize the ADR method into four parts, and the baseline network is multi-adversarial learning. Then, the baseline is equipped with coarse alignment, channel-wise disentanglement, and a complete ADR method. For qualitative results in Figure.13 below, it is easy to gauge which way shows considerable outcome based on the attention map.

Figure. 14. Visualization of ablation study for attention map [6]

For quantitative results in Figure.14 below, there are two cases for different metrics. In terms of the Dice score, ADR attain the highest score compared to other methods. In addition, when the model is gradually complete, they also seize the improvement. However, speaking of ASD, it is slightly different from the Dice score. Overall, almost every part has a positive impact on the ADR model.

Figure. 15. Visualization of ablation study for quantative analysis [6]