This blog post mainly focuses on the topic “Recent Trend on medical image segmentation”. First, I will introduce the motivation for “Recent Trend on medical image segmentation”. Secondly, I will show you the introduction of medical image segmentation and its relevant method – domain adaptation. Then, the three methodologies on domain adaptation for medical image segmentation, Attention-Enhanced Disentangled Representation Learning for domain adaptation, Domain Specific Convolution and High Frequency Reconstruction for domain adaptation and SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation. Finally, I will conclude and give my perspective on this topic. 

1.   Introduction

1.1  Motivation  

In recent years, with the development of machine learning, deep neural networks have achieved outstanding performance to accurately diagnose with medical image segmentation. Despite being powerful tools, medical image segmentation presents unique challenges, as these images often exhibit variability. When deploying multiple datasets from different institutions and hospitals, researchers will need help with domain shift. Domain shift arises due to the difference in medical imaging devices and diversified imaging modalities, such as MRI, CT, or X-Ray. For instance, when applied to CT images from a different hospital with varying machines, a model trained on CT images from one hospital may not exhibit optimal performance. Since these different devices lead to different data distribution. Furthermore, domain shift can seriously compromise the effectiveness of the segmentation models. Therefore, domain adaptation is introduced to tackle domain shift and be equipped with the model to train on one model and effectively perform well on another domain. Moreover, because of expensive costs and patients' privacy, medical data are challenging to collect and usually unlabeled. Hence, unsupervised domain adaptation (UDA) comes out to be applied in the medical field to remove this obstacle. For these reasons, I will introduce UDA and provide the recent methodologies.

1.2  Medical Image Segmentation

Image Segmentation is very important technique for computer vision to divide these complex images into different quantifiable units based on pixel and help reduce the complexity to analyze easily. Medical Image Segmentation is one kind of image segmentation that is applied in medical field. This technique plays a  crucial role in our life. It can significantly improve diagnose and treatment. For example, by segmenting X-Ray image for chest in Figure.1, doctor can accurately identify where is infected and properties of chest, then enabling precise diagnosis and effective treatment planning. Furthermore, the trend of medical image segmentation includes unncertainty estimation, domain adaptation, multi-modal segmentation, etc. Here, the blogpost concentrates on domain adaptation to solve the domain shift for medical image sengemtation.

Figure.1. Example of one modality: X-Ray [18]

1.3  Domain Shift

Domain shift, also known as dataset shift, is one of the significant challenges for medical image segmentation. This is due to the difference between the source and target domains when the scientist collects data from different imaging devices. Then, it leads to decreasing model performance and reliable predictions. Besides, machine learning models are trained to recognize patterns in the source domain. When applied to a target domain with a different distribution, these patterns could not be valid and causes a degradation in accuracy. Furthermore, it is impractical to retrain new domain data since manual annotating is very expensive and time-consuming. Therefore, domain shift is a very crucial problem for medical image segmentation.

Below are four types of domain shifts based on [5]:

  1. Covariate shift  : input features change between the source and target domain, but the relationship between features and labels remain unchanged. For example, a model trained on MRI images from one hospital might not perform as well when applied to MRI images from a different hospital that uses different image devices.
  2. Concept shift (data drift) : the input features remain the same between the source and target domain, but the relationship between features and labels changes, i.e., the meaning of predicted labels changes. For instance, when the system gauge which email is spam. Sending emails frequently is a clever hint of spamming behavior in the past, but it is totally different nowadays. 
    Convariate Shift : p_s(x) \not = p_t(x)\\ Concept Shift : p_s(y) \not = p_t(y)
  3. Conditional shift [1] : a change in the conditional distribution between the input features and the labels when moving from the source domain to the target domain. Take a heart disease model based on patient features. However, leveraging this model in a different context with significant lifestyle and dietary generates differences compared to the original one. Factors may not contribute to heart disease risk like the setting model.
  4. Prior shift (label shift) : literally, a shift in the prior distribution of the target domain compared to the source domain. Take a classifier as an example. Training a classifier is to predict diagnoses given symptoms, as the relative prevalence of diseases is changing over time.
Conditional Shift : p_s(x|y) \not = p_t(x|y)\\ Prior Shift : p_s(y|x) \not = p_t(y|x)


In this blog post, Covariate shift is our primary domain shift type to be addressed since they result from differences in imaging modalities and medical institutes. For example, three methods for medical image segmentation [6][7][8] bidirectionally deploy the CT images as the source domain and MRI images as the target domain. To be more specific, the covariate shift arises because the source images and target images are from different modalities. Therefore, this blogpost concentrates on these approaches to tackle the covariate shift.

1.4  Domain Adaptation - Unsupervised Domain adaptation

Domain Adaptation is a technique to tackle the domain shift. It can be also recognized as one of transfer learning. Figure 3. [4] also shows different technique about transfer learning.

  1. Unsupervised Domain Adaptation (UDA) [2] : Labeled source domain and unlabeled target domain
  2. Supervised domain adaptation : Labled source and target domain.
  3. Semi-supervised domain adaptation : Lable source domain and a few samples in the target domain to be labeled.
  4. Self-taught learning : Unlabeled source and labeled target domain
  5. Unsupervised learning : Unlabeled source and target domain

This blog post will focus on UDA, which indicates unlabeled samples in the target domain for domain adaptation.


Figure. 3. A taxonomy of transfer learning approach

1.5  On-Trend Methods of Unsupervised Domain Adaptation on medical image analysis

Based on the survey [4] of recent advances and perspectives on medical image analysis for UDA, there are seven different skills to achieve the performance of UDA.

  1. Feature alignment : align the domain-invariant features across domains using CNN models. A large portion of these models utilize an architecture to the Domain Adversarial Neural Network (DANN) structure.
  2. Image alignment :  align these images to identify and isolate certain features that appear in more than one image, such as Generative Adversarial Network (GAN).
  3. Feature alignment + Image alignment : integrate above two techniques, for example, Unet-GAN
  4. Disentangled representation : Detect and disentang the underlying factors hidden in the visible data, presenting them in representation form. Besides, A method based on self-ensembles was proposed for unsupervised domain adaptation, for instance, ADR
  5. Ensemble Learning : Multiple models or multiple views of the data to improve domain adaptation, such as self-emsembling.
  6. Soft label : Use probabilistic or continuous instead of discrete labels to provide more information for domain adaptation, for instance, CH2-UNET
  7. Feature learning :  Learning more generalizable and transferable features, for example, UDA-TFL


As shown in Figure.4, the dark blue text box contains three methodologies that I will present. Besides, I classify ADR as disentangled representation group because this is the core method for ADR. Furthermore, both DoCR and SMC-UDA belong to the feature alignment + image alignment group. They align the feature and image to learn domain-invariant structural information.


Figure. 4. A taxonomy of unsupervised domain adaptation (UDA) [4]


2.  Methodology

2.1    Attention-Enhanced Disentangled Representation Learning for domain adaptation. [6]

Figure 5. An overview of  ADR [6]

Methodology & Architecture

According to Figure 5., the essential concept for the ADR model is the channel-wise disentanglement approachHilbert-Schmidt independence criterion (HSIC), and an attention bias module. Besides, the ADR model is mainly composed of 3 parts: Alignment of Imaging Characteristics, Channel-wise disentanglement, and attention-bias for adversarial learning.  Even in inference, they utilize an information fusion calibration (IFC) to achieve higher accuracy in predictions during inference.


First, the authors implemented coarse alignment before disentanglement, so the model can eliminate apparent domain shifts between MRI and CT, such as brightness. Besides, GAN model aims to distinguish the real image. Secondly, channel-wise disentanglement with single-path encoding promotes mutual guidance. Accordingly, it can remove the interference of domain shift on domain-invariant features. The independence and complementarity of disentangled features are limited by the Hilbert-Schmidt independence criterion (HSIC) [19]. The classifier c_{spf} is adopted to classify domain-specific features f_{spf} to ensure that f_{spf} and domain-specific information are highly correlated.Thirdly, the most critical part  is an attention-biased module, which makes the model pay more attention to the alignment of task-relevant regions. Channel max pooling (CMP) is employed to take essential features. The discriminator distinguishes source or target images to achieve cross-domain alignment at the output level. In inferece, the authors also conduct an information fusion calibration (IFC) strategy to utilize the inter-slice information as much as possible and decrease the prediction error. Then, they reach the cosine similarity matrix and by computing the cosine similarity of the two pixel-wise slices. 


Loss Function


The above equation is the total optimization loss for the ADR method, and it can be split this loss equation into five stages.  L^t_{adv} is GAN loss , which aims to recognize whether it is an actual source image. HSIC loss, adapted for disentangled representation learning, is to understand the relation between  f_{spf} and f_{inv} whether they are independent. Furthermore,L_{spf} is cross entropy loss for the classifier to distinguish f_{spf}. In addition, L_{seg} combines cross entropy loss with dice loss. Then, it is employed to train a classifier to generate a prediction.  Finally, L^p_{adv}is GAN loss to distinguish whether the images are from source or target domain to reach cross alignment with excellent performance. To sum up, there are total optimization loss proposed for the ADR method. 

Dataset & Metrics

The authors use the Multi Modality Whole Heart Segmentation (MMWHS) challenge 2017 [10] dataset that includes unpaired CT and MR volumes with ground-truth masks.  They research different tasks (1) CT as source domain to MR as target domain (2) MR as source domain to CT target domain.

Figure 9. MMWHS dataset  [10]

They provided two metrics to measure the performance. First approach is the Dice score [20] for pixel-wise similarity between a predicted segmentation and its corresponding ground truth. Also, the Dice score is defined as 1 when producing perfect overlap and 0 when performing empty overlap. In other words, the higher the Dice score, the better the image segmentation performance. 


P is the predicted segmentation and G is corresponding ground truth

Average symmetric surface distance (ASSD) [20]  is computed by the mean of all the distances from each point on the surface of predicted segmentation to the surface of the ground truth. ASSD is particularly helpful in getting the boundary location right because this metric provides a valuable measure of the overall differences in the segmented boundaries. Furthermore, an ASSD with a value of 0 points out a perfect match. That is to say, the smaller the ASSD value, the better the predicted segmentations and groundtruth match.

d(x,y)  is Euclidean norm, G is groundtruth, M is predicted segmentation

Result

Quantitative Analysis

They execute bidirectional domain adaptation experiments between CT and MRI images for quantitative analysis. From the quantitative results in Table 3. below, the authors compared the supervised training and without-adaptation model. As a result, it is very clear to determine the significant domain shift between the two domains. Furthermore, the ADR method achieves the improvement in the Dice score 80.0% and the ASD being reduced to 5.5 on the MRI to CT. On the other hand, from CT to MRI, it improves 0.2% in the Dice and reduces 0.4 in ASD. On top of that, IFC has a positive impact on every result. In other words, the ADR+IFC method almost demonstrates superior effectiveness.

Table. 3. Quantitative comparison with different methods [6]

Qualitative Analysis

The researchers present a visualization of segmentation results for qualitative analysis, as shown in Figure.11 below. The results align more closely with the ground truth for the slice images in both directions, but IFC does not significantly influence visualization results.

Figure. 12. Visualization of Cardiac segmentation results [6]

Moreover, the authors also provide t-SNE visualization to evaluate the performance of features with and without domain adaptation. There are significant domain shifts between CT and MRI. That is to say, the proposed method secures exceptional results to address the problem. In conclusion, the ADR method apparently achieve great success to align in cardiac segmentation.

Figure. 13. t-SNE visualization [6]

Ablation Study

Aside from complete architecture, they conduct ablation experiments to evaluate how effective every part is. They categorize the ADR method into four parts, and the baseline network is multi-adversarial learning. Then, the baseline is equipped with coarse alignment, channel-wise disentanglement, and a complete ADR method. For qualitative results in Figure.13 below, it is easy to gauge which way shows considerable outcome based on the attention map.


Figure. 14. Visualization of ablation study for attention map [6]

For quantitative results in Figure.14 below, there are two cases for different metrics. In terms of the Dice score, ADR attain the highest score compared to other methods. In addition, when the model is gradually complete, they also seize the improvement. However, speaking of ASD, it is slightly different from the Dice score. Overall, almost every part has a positive impact on the ADR model. 

Figure. 15. Visualization of ablation study for quantative analysis [6]

2.2   Domain Specific Convolution and High Frequency Reconstruction for domain adaptation [7]

 

Figure. 6. An overview of DoCR  [7]

Methodology & Architecture

The researchers present DoCR method to tackle the domain shift . Specifying the architecture of DoCR, it has a U-Net architecture. Also, it is composed of 3 main parts: Domain Specific Convolution (DSC), Encoder-decoder BackboneHigh Frequency Reconstruction (HFR) and Segmentation. Besides, DSC and HFR are two core concepts to deploy. Therefore, the blogpost focuses on  DSC and HFR here.


DSC module is adapted as the first convolutional layer to eliminate as much domain-specific information as possible from the input images x. Also, it extracts domain-insensitive feature map f_{DSC}, feeding into the encoder backbone. Besides, the DSC module contains a DSC head and a domain-specific controller. The former component has a convolutional layer, a ReLU layer, and a batch normalization layer. The latter element, a convolutional layer, is conditioned on the domain code of of x. Furthermore, they employ low frequency component replacement to increase the diversity of source-domain images and reduce the interference caused by domain code. To be more detailed, the low frequency component in source domain can be replaced by the corresponding low frequency component from either source or target domain. Finally, Res-Net is deployed to predict the domain code of augmented images. HFR head comprises a convolutional layer and a tanh layer. HFR head aims to reconstruct the high frequency image \tilde{h} by input feature map from the decoder. Also, they can adjust the low frequency components to zero to obtain the ground truth of high frequency image h

Loss Function 

This total optimization loss of DoCR can be split into three different loss function. First of all, L_{cls} is cross entropy loss for classifier to make a prediction.d_k is the domain label and d^p_k is softmax probabilty for the k domain. Secondly, L_{rec}is a reconstrction task presented by L1 loss, which reconstructed three different domain images. The final loss function is segmentation loss, which employs cross-entropy loss for the supervised source domain data and entropy loss for UDA. In conclusion, there are total loss functions for the training of DoCR method.

Dataset & Metrics

The authors conduct experiments on the RIGA+ [11] dataset, a OC/OD segmentation dataset. It comprises labeled data from BinRushed and Magrabia and unlabeled data from the MESSIDOR database. Also, the authors deploy BinRushed and Magrabia as source domain and MESSIDOR as target domain. Besides, they evaluate the performance by the Dice Score.

Figure. 10. RIGA+ dataset (OC/OD) [11]

Result

Quantitative Analysis

They presented a table below containing the current SOTA methods for quantitative analysis. The DoCR method accomplishes outstanding performance and almost dominates this Dice score. Moreover, the authors also compared with and without domain adaptation. According to the score between them, domain shift apparently exists between two domains. In the meantime, it is evident that domain adaptation can effectively address this issue. As a result, DoCR achieves relatively more incredible performance among these methods.


Table. 4. Quantative analysis between selected UDA methods [7]

Qualitative Analysis

The authors also utilize the visualization for qualitative analysis. As shown in Figure.15, DoCR method indicates more accurate prediction to corresponding groundtruth than other methods they implemented.   

Figure. 16. Visualization of OC/OD for quantative analysis [7]

Ablation Study

They conduct the ablation experiment to gauge the effectiveness of two core components, DSC and HFR. When HFR is compared to HFR + Multi-Input-Head, it is very apparent to know bad influence with Multi-Input-Head. With Multi-Input-Head, the model is hard to produce segmentation because target domain-specific features are mainly optimized under the supervision of reconstruction. Furthermore, when analyzing HFR, the table indicates image reconstruction does not bring about a major improvement, significantly worse for BASE2. Because image reconstruction compels the network to restore the image details, not helpful for segmentation tasks. On top of that, the effectiveness of HFR is higher than DSC since its flexibly generated parameters and diverse domain codes.


Table. 5. Ablation Study for DoCR [7]

2.3   SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation [7]

Figure. 7. An overview of Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation  (SMC-UDA) [8]

Methodology & Architecture

The authors presented their proposal, SMC-UDA, to deal with domain shift and improve the current methods of UDA. Their core concepts include three methods. First, they carry out the edge structure as a bridge between domains and present voxel and point cloud segmentation. Secondly, they deploy knowledge distillation to improve performance. Lastly, they deal with the challenge of the limited number of point clouds with progressive ROI. Based on the architecture shown above, it can be divided into 2 main parts, multimodal learning and self-training for SMC-UDA.


Multimodal learning comprises multi-scale feature encoder, cross-modal knowledge distillation, supervised learning on source domain.  First, multi-scale feature encoder aims to convert 3D images and edge point clouds. Afterward, the researchers utilize point-to-voxel mapping to align the upscaled \hat{F}^{pt}_l with \hat{F}^{img}_l. Also, they adopted two linear classifiers to classify points to perform the predictions P^{pt} and P^{img}. Next, the structure of cross-modal knowledge distillation is presented in Figure.7 below. Fusion-then-distillation enhances the point cloud structural representation. Then, the multimodal fusion module fuses two features and performs unidirectional alignment between the image and edge. Besides, through skip connection, they get enhanced one \hat{F}^{pt2E}_l and \hat{F}^{fuse2E}_l.  These two features are very helpful for classifiers to produce semantic segmentation scores S^{fuse}_l and S^{pt}_l. Furthermore, the K-L  divergence[21] strengthens the performance and improves the generalization of point clouds to deconstruct a broader variety of image textures. Finally, the authors conduct supervised learning on source domain to stabilize the segmentation.



For the self-training, they presented the the sampler with progressive ROI in Figure 8. incrementally focuses on the kidney region and enhance the intensity of the kidney to strengthen the structural representation.


Figure. 8. A diagram of sampling ROI with epochs[8]


Loss Function

L = L_s + L_t = 1\times L_{SEGs} +0.05\times L_{xMs}+\lambda_t (1\times L_{SEGt} +0.05\times L_{xMt}) \\

L_{SEG} = L_{lvsz} + L_{wce}

Unlike the two above methods, the loss function for SMC-UDA is more complicated, but it can still be split into two parts: source domain and target domain. That is to say, the total loss function L combines the loss of the source domain and the loss of the target domain. L_{xm} is K-L divergence as distillation loss to improve the generalization of the point cloud branch. Next, the segmentation score is optimized by L_{lvsz}  [22] and L_{wce} . L_{lvsz}  is the optimization of segmentation for IoU measure score and considers the entire segmentation mask to maximize the order of predictions concerning ground truth labels.  L_{wce} is deployed to class imbalance., representing class frequency. Segmentation loss is equal to L_{lvsz} plus L_{wce}. L_s is source domain loss to stabilize the segmentation map. Target domain loss is optimized by a hyperparameter for slow distillation.


Dataset & Metrics

The scholars undertake a research project on cross-site and cross-modality kidney segmentation based on three datasets: KiTS 2021 [12], AMOS 2022 [13], and CHAOS 2019 [14]. KiTS dataset, consisting of annotated CT images, was implemented as a source domain. Furthermore, the AMOS dataset, containing CT images, was utilized as the target domain to conduct cross-site experiments. Lastly, the CHAOS dataset comprises MRI data as the target domain for the cross-modality investigation.

Figure.11. KiTS dataset [12]

They also utilize the Dice score and ASSU. However, they still introduce Intersection over union(IoU) measure to evaluate the performance of the model. The IoU score calculates the mean average precision. Like the Dice score, it is a method estimating the amount of overlap between the predicted segmentation and ground truth. Besides, it is a accuracy method to track human annotations. In this case [8], the authors employ the IoU score to determine the performance of different annotations of the renal pelvis. In addition, there is no overlap when a value of IoU is 0. In contrast, the perfect overlap is 1 of IoU. 


Result

Quantitative Analysis

The authors of SMC-UDA perform quantitative analysis to evaluate the performance based on the Dice score, ASSD, and IoU score among SOTA, mainly 2D methods. Like the two above methods, they also add no domain adaptation setting, nnUNET, with multi-modal and point cloud (PC) branch as the backbone. As shown in Table.5, the edge-based model and PC branch exhibit similar results to nnUnet without domain adaptation. According to the below part of Table 5, PC branch and Multi-modal achieves better performance than nnUNET. Furthermore, their SMC-UDA accomplishes more outstanding performance than the baselines. The quality of the generated image significantly constrains the effectiveness of generation-based UDA methods without manual image alignment. Through structure-constraint application, the proposed SMC-UDA exhibits remarkable cross-domain efficiency to achieve a comprehensive Dice score of 88.1% and an ASSD of 1.8mm. In conclusion, SMC-UDA shows considerable success for cross-modality setup. It is also a competed method to compare SOTA 2D-based UDA methods.


Table. 5. Quantitative analysis between selected UDA methods for SMC-UDA [7]


Qualitative Analysis

To evaluate the model more accurately, the researchers conduct qualitative analysis by visualizing kidney segmentation results. The upper row contains the results of "KiTS21-CT → AMOS-CT," and the lower one includes "KiTS21-CT → CHAOS-MRI". The colored lines, blue and green line, are the kidney predictions of edge maps. As Figure 17., it shows superiority over generative UDA models for kidney detection and has accurate kidney predictions of edge maps.

Figure. 17. Qualitative analysis between selected UDA methods for SMC-UDA [7]


5.   Review

5.1 Comparison of the models 

Below are two tables. Upper table shows some properties of three methods. For example, whether they analyze the results according to quantity and quality. + means positive, in other words, they have these properties. In contrast, - means negative, without the properties. Lower table indicate the advantages and disadvantages from my perspectives.

Methods

Task

Source 

dataset

Target 

dataset 

Datatype

Ablation analysis

t-SNE visualization

Qualitative Analysis

Quantitative Analysis

Multiple 

Metrics

Source Code
ADRHeartMMWHSMMWHS2D++++++
DoCROC/ODRIGA+RIGA++2D+-++-+
SMC-UDAKidneyKiTSAMOS、CHAOS3D

-

-+++++

Table. 6. Comparison of three methods

From my perspective, ADR provides disentangled representation learning and attention-bias module to preserve task-relevant features. It is a straightforward concept and well-documented to figure out this method. However, attention and HSIC are not novel ideas. Also, many hyperparameters is time-consuming to tune. Furthermore, it is relatively hard to implement because of the GAN network and disentangled representation. Secondly, DoCR introduces DSC module and high frequency reconstruction to improve performance, and the authors conduct extensive experiments with eight different methods. Nevertheless, they only evaluate the performance by the Dice score. Equipped with high-frequency reconstruction, DoCR is highly sensitive to noise. Besides, it is questionable to access 3D data. Lastly, SMC-UDA indicates knowledge distillation and Progressive ROI to enhance the structural information. Unlike the above two methods, SMC-UDA can access 3D data. It is very beneficial for further research. But they still have drawbacks. SMC-UDA contains complicated loss functions and architecture. It is relatively hard to understand. Due to knowledge distillation and complex loss function, the computational cost is high.


MethodsStrengnessWeakness
ADR
  1. Improve feature extraction
  2. Promote mutual guidance
  3. Simple Concept to understand
  4. Ablation Study
  5. Source code available
  1. Attention and HSIC are not novel ideas
  2. Questionable to access 3D data
  3. Many Hyperparameters
  4. Hard to implemment
DoCR
  1. Avoid the interference of the artifacts
  2. Novel Strategies
  3. Ablation Study
  4. Source code available
  1. Only use Dice score to evaluate
  2. Questionable to access 3D data
  3. Sensitivity to data quality
SMC-UDA
  1. Access to 3D data
  2. Potential for further research
  3. Multi-datasets experiment
  4. Novel Strategies
  1. No precisely ablation analysis
  2. Complicated architecture
  3. Computational cost is high

Table. 7. Pros and cons of three methods

5.2 Summary

UDA is a beneficial technique for medical image segmentation. Following the definition of UDA, data in the source domain are labeled. On the contrary, data in the target domain are unlabeled. It is tough and expensive for medical data for medical institutions and research centers to collect data from individuals because medical information, such as the status of diseases and organs, is very private. Also, the data in the medical field often lack labels and annotations. Hence, UDA methods significantly improve the performance of medical image segmentation. Secondly, in the application of the medical industry, many different sensors can analyze the organs and capture images. The doctors usually combine two devices to more accurately diagnose the disease. In that time, the domain shift from two distinct domains can be tackled by UDA methods. Therefore, UDA approaches considerably boost the efficiency of medical image segmentation.

Today, the three methodologies in this blog post indicate different ways to perform achivement. Compared to current SOTA methods, the authors lead to remarkable enhancements. 

(1) ADR: The authors presented several ideas to ensure the model's efficiency, a channel-wise disentanglement approach, HSIC criterion, attention bias module, and information fusion calibration. The most important is the channel-wise disentanglement approach, replacing with the dual-path disentanglement of former papers, to promote mutual guidance more efficiently between domain-invariant and domain-specific features. Also, HSIC is utilized to control the independence between disentangled features. Furthermore, the attention bias module helps the model capture domain-specific features. Finally, information fusion calibration increases the accuracy of the model.

(2) DoCR: The authors proposed the High-Frequency Component (HFC) and Domain Specific Convolution (DSC) to address the problem for Fourier transform-based UDA and domain-specific batch normalization (DSBN). First, HFC captures structural information by filtering out low-frequency components to keep HFC. Secondly, DSC achieves excellent performance in extracting domain-invariant features.

(3) SMC-UDA: The authors provided a UDA segmentation method with multi-modal learning. More precisely, the model consisted of a multi-modal learning backbone with a cross-modal unidirectional distillation loss to the image and edge structure. In self-training, they applied knowledge transfer between the predictions of the two domains to ensure renal segmentation and improvement with progressive ROI.

In the future, from my perspective, these several novel ideas will be on trend. First, it's a 3D-based method. It is beneficial for medical image segmentation because MRI plays an important role in the medical industry. Then, MRI can be captured in either 2D or 3D. For example, athletes tear their ligaments and suffer from ACL injuries. MRI is a priority of devices to check the tissue status. However, the current SOTA methods for UDA are almost set with 2D-based. Therefore, the future model will have the capability to handle 3D data. Also, self-supervised learning can be applied in UDA because it concentrates on unlabeled data, a similar concept to UDA. For example, self-distillation is one of the potential techniques that can be deployed in UDA. Besides, UDA will be able to deal with real shift problems; as I mentioned in the introduction, nowadays, large amounts of UDA approaches are mentioned to address covariate shifts. The researchers could take other domain shift types into consideration, such as label shift. In conclusion, UDA is currently very crucial in the world and also serves as a critical factor in the future.

6.   Reference

[1] Zhu, Q., Jiao, Y., Ponomareva, N., Han, J., & Perozzi, B. (2023). Explaining and Adapting Graph Conditional Shift. arXiv preprint arXiv:2306.03256.

[2] Ajith, A., Gopakumar, G. (2023). Domain Adaptation: A Survey. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds) Computer Vision and Machine Intelligence. Lecture Notes in Networks and Systems, vol 586. Springer, Singapore.

[3] Wang, X., Chen, H., Tang, S. A., Wu, Z., & Zhu, W. (2022). Disentangled representation learning. arXiv preprint arXiv:2211.11695.

[4] Guan, H., & Liu, M. (2021). Domain adaptation for medical image analysis: a survey. IEEE Transactions on Biomedical Engineering, 69(3), 1173-1185.

[5] Liu X, Yoo C, Xing F, Oh H, Fakhri G, Kang JW, et al. Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives. APSIPA Transactions on Signal and Information Processing. 2022 05.

[6] Sun, X., Liu, Z., Zheng, S., Lin, C., Zhu, Z., Zhao, Y. (2022). Attention-Enhanced Disentangled Representation Learning for Unsupervised Domain Adaptation in Cardiac Segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_71

[7] Hu, S., Liao, Z., Xia, Y. (2022). Domain Specific Convolution and High Frequency Reconstruction Based Unsupervised Domain Adaptation for Medical Image Segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_62

[8] Zhong, Z., Li, J., Bi, L., Yang, L., Kamel, I., Chellappa, R., ... & Jiao, Z. (2023). SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation. arXiv preprint arXiv:2306.08213.

[9] Kouw, W. M., & Loog, M. (2018). An introduction to domain adaptation and transfer learning. arXiv preprint arXiv:1812.11806.

[10] Zhuang, X., Shen, J.: Multi-scale patch and multi-modality atlases for whole heart segmentation of MRI. MedIA 31, 77–87 (2016)

[11] Almazroa, A., Alodhayb, S., Osman, E., Ramadan, E., Hummadi, M., Dlaim, M., ... & Lakshminarayanan, V. (2018, March). Retinal fundus images for glaucoma analysis: the RIGA dataset. In Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications (Vol. 10579, pp. 55-62). SPIE.

[12] Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan, Y., Mu, G., Lin, Z., Han, M., et al.: The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Med. Image Anal. 67, 101821 (2020)

[13] Ji, Y., Bai, H., Yang, J., Ge, C., Zhu, Y., Zhang, R., Li, Z., Zhang, L., Ma, W., Wan, X., et al.: Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv preprint arXiv:2206.08023 (2022)

[14] Kavur, A.E., Gezer, N.S., Barış, M., Aslan, S., Conze, P.H., Groza, V., Pham, D.D., Chatterjee, S., Ernst, P., Özkan, S., et al.: Chaos challenge-combined (ctmr) healthy abdominal organ segmentation. Med. Image Anal. 69, 101950 (2021)

[15] Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: International Conference on Machine Learning (ICML), pp. 1989–1998. PMLR(2018)

[16]  Liu, Q., Chen, C., Qin, J., Dou, Q., Heng, P.A.: FedDG: federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1013–1023, June 2021

[17]  Chang, W.G., You, T., Seo, S., Kwak, S., Han, B.: Domain-specific batch normalization for unsupervised domain adaptation. In: IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 7354–7362 (2019)

[18] Liu, W., Luo, J., Yang, Y., Wang, W., Deng, J., & Yu, L. (2022). Automatic lung segmentation in chest X-ray images using improved U-Net. Scientific Reports12(1), 8649.

[19] Liu, X., Thermos, S., O’Neil, A., Tsaftaris, S.A.: Semi-supervised meta-learning with disentanglement for domain-generalised medical image segmentation. In: de
Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 307–317. Springer, Cham (2021).
[20] Yeghiazaryan V, Voiculescu I. Family of boundary overlap metrics for the evaluation of medical image segmentation. J Med Imaging (Bellingham). 2018 Jan;5(1):015006. doi: 10.1117/1.JMI.5.1.015006. Epub 2018 Feb 19. PMID: 29487883; PMCID: PMC5817231.

[21]Hershey, J. R., & Olsen, P. A. (2007, April). Approximating the Kullback Leibler divergence between Gaussian mixture models. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-317). IEEE.

[22] Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: A tractablesurrogate for the optimization of the intersection-over-union measure in neural
networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4413–4421 (2018)

  • Keine Stichwörter