8: SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining

Authors of the paper: Benjamin Billot, Douglas N. Greve, Oula Puonti, Axel Thielscher, Koen Van Leemput, Bruce Fischl, Adrian V. Dalca, Juan Eugenio Iglesias

Blog post written by: Milan Cupac

Introduction

Brain MRI segmentation plays a crucial role in a variety of clinical applications such as diagnosis, therapy planning, and disease monitoring. The ability to do precise further analysis and quantitative measurements requires accurate and robust segmentation of brain regions [5]. This blog post will present one way for improving the segmentation performance, the whole method being named SynthSeg, and which has been recently developed.

MRI images are often followed by great variability in different parameters, the most important being contrast and resolution. The networks trained on images of specific contrast and resolution perform much worse when these parameters are changed and therefore have to be retrained if that is the case. Many techniques are applied in order to prepare the segmentation network for the wide distribution of the input data. The one worth mentioning is the domain adaptation strategy, where the transformation is learned from the labeled source domain to the unlabeled target domain and, thereby, obtained labeled instances in the target domain as well [7].

On the other hand, SynthSeg utilizes a domain randomization strategy, whose core is the generative model which simulates different effects by artificially modifying the training set data. However, its parameters are constantly randomized so that there is no specific target domain, but rather it is randomized as the name suggests [6]. This makes the network adapt to a much wider distribution of the data and increases the performance in general cases. Remarkably, the network does not have to be trained again or fine-tuned if the input training set has a different resolution or contrast for example. Rather, it can be directly used.

The method steps will be explained in the next chapter in detail. Afterward, two important experiments conducted will be explained, followed by a discussion of the whole approach and the conclusion.

Method

Fig 1. Schematic overview of SynthSeg methodology.

Figure 1 shows the schematic overview of the SynthSeg methodology. As shown in the figure, the core part of it is a generative model which preprocess the data given to the network and which will be further explained in the following paragraphs. The entire pipeline of the method can be summarized as follows:

First, real MRI scans of the brain are labeled and given to the generative model
Next, the generative model performs different steps to alter the images in the initial training set, producing thereby new images together with corresponding labels (ground truth).
Finally, the altered images are given to the 3D U-Net, which is the standard network for segmentation purposes, and which then performs the segmentation on them. The predicted labels are compared with the target labels using the average soft dice loss and the weights of the network are correspondingly updated, as shown in the figure.

The generative model consists of 4 steps and its main purpose is to simulate different effects which arise during the MRI scans of the brain. Thereby, it enlarges the distribution of the data given to the segmentation U-Net and makes it more robust to a wider range of images. The two example brain MRI images after each of the steps of the generative model are shown in figure 2.

Fig 2. Different steps of the generative model, applied to two example images. Image source: [1]

Step 1. Spatial Augmentation

The first transformation performed is spatial augmentation, the purpose of it being to make the segmentation network adaptable to the cases when the input image comes rotated or elongated in one direction for example. Spatial augmentation consists of linear (affine) transformation applied first and non-linear transformation applied afterward. The linear transformation includes rotation, scaling, shearing, and translation. Each of them consists of three parameters, which are sampled randomly, for each new training image again, but from a predefined range. Even if the same image comes again, it will be differently transformed.

The non-linear transformation consists of randomly generating a small vector field first. It is afterward upsampled to the resolution of the image, and then integrated. The purpose of integration is for the field to become diffeomorphic, in other words, the transformation it produces is invertible.

Important to note is that this step also yields the target labels which are obtained by applying the same transformation on the labeled image from the training set.

Step 2. GMM Sampling

To repeat, the output image of the first step comes together with the label image. This image denotes for each pixel which part of the brain it belongs to. Each class is assigned a normal distribution, which the new image is sampled from. Step 2 first samples mean and standard deviation values for each of the distributions. Next, a new image is sampled, so that each pixel is sampled from the distribution depending on the class it represents.

Important to note is that this step is the reason why the whole procedure is a generative model - indeed, in this step, it is not the old image that is altered, but rather a new image that is sampled.

The purpose of this step is to model tissue heterogeneities, as well as the thermal noise of the scanner.

Step 3. Bias field simulation

A bias field is a low-frequency signal which arises because of the fact that the magnetic field is not fully homogeneous inside the MRI scanners, but rather there are some small inhomogeneities, especially on the boundary of the scanner. If the image for the analysis is not preprocessed in a way to reduce the effects of the bias field, the performance of segmentation (and thereby all the later analyzing steps) is affected. Figure 3 shows the segmentation of the image with and without bias field removal.

To make the network adaptable to bias field effect, SynthSeg artificially modifies the training images, so that they act as if the effect were present. The modification is done in the following way: first, the low-resolution 4x4x4 matrix is sampled from a normal distribution with zero mean. Then, it gets upsampled to the resolution of the training image and exponentiated afterward (which ensures that all the values of the matrix are positive). Finally, the image is multiplied with the matrix

Fig 3. The comparison of the segmentation with and without the presence of bias field. Image source: [5]

Step 4. Resolution variability simulation

The last step of the generative model is the modeling of different values for slice thickness and slice spacing.

By increasing slice thickness, different tissue layers add into the value of one voxel. Therefore, slice thickness is modeled by applying the Gaussian blur on the current image.

Increased slice spacing reduces the resolution of the obtained scans. Consequently, it is modeled by downsampling the blurred image to lower resolution and then upsampling again.

Importance of each of the steps in the generative model

Different experiments were conducted to assess the performance of SynthSeg. The most interesting and most important among them is probably the experiment where the authors compared the performance of the original SynthSeg with different modifications of it. Each variant lacks or has changed some of the steps in the generative model. By comparing the performance among them, the importance of each of the steps of the generative model is assessed.

The result of the comparison can be seen in figure 4. There, SynthSeg-R denotes SynthSeg which is trained for a specific resolution. SynthSeg-RC denotes SynthSeg which was retrained for every specific value of contrast and resolution. Tight and tighter GMM priors have step 2 of the generative model modified. More specifically, the parameters of the normal distribution are chosen from a tighter range (5% and 25% tighter range for the "tight" and the "tighter" case respectively). The comparison was performed on eight different datasets.

There a couple of interesting conclusions of the experiment. First, it is clear that the deformation and bias steps (steps 1 and 3) contribute to a large extent to the increased performance of SynthSeg. If they are excluded from the generative model during the training, the dice score becomes lower for each of the datasets used. Second, the performance of the "tight GMM" network has comparable performance as the original SynthSeg. However, the dice score decreases significantly by further tightening the range for the parameters of GMM. This highlights the significance of the variability of the input images which is created by step 2. Finally, the original SynthSeg has, on average, higher performance compared to the resolution or resolution and contrast-specific case.

Fig 4. Comparison of the performances of original and modified SynthSeg (generative model without some steps for example). Image source: [1]

The impact of the size of the input training set

As mentioned earlier, training input images are real MRI scans which are labeled either manually or automatically (a specific software performed the labeling). The number of manually labeled input training images required for efficient training of the segmentation network is one of the most important parameters because it directly influences the cost of the training.

Therefore, the authors investigated the performance (dice score) with variable sizes of the input training set as can be seen in figure 5. Different colors of the lines represent the dice score of the SynthSeg on different datasets (six lines), whereas the performance of the standard supervised network trained on real T1 scans is marked with reversed triangles (only one line).

Interestingly, on most datasets, the SynthSeg dice score remained almost constant after there are more than five images in the input training set. This was not the case with a standard supervised approach (reversed triangles), whose performance constantly increases with the growing size of the training set. Furthermore, the performance of the second approach was constantly below SynthSeg's performance.

The dashed lines denote the dice score behavior after adding a lot of automatically labeled training images. Important to note is that, in most cases, the performance was increased by increasing the training set in this way.

Fig 5. Dice score performed by SynthSeg (circles) and T1 baseline method (reversed triangles) for different numbers of input training images. Image source: [1]

Discussion

Advantages of SynthSeg compared to other methods

The SynthSeg approach has many advantages for the segmentation of MRI scans compared to standard supervised approaches. The most important one is that the model automatically learns much wider data distribution than the input images provided and during the training already. As a consequence, the model shows similar performance on images of different contrast and resolution, which is of very importance for this field of application. It does not have to be retrained or fine-tuned, which significantly lowers the cost of the training and makes the model directly applicable to datasets of different data distributions.

Since the results obtained in the paper are in favor of the method, it will motivate further exploitation of the domain randomization strategy in the field of MRI scan segmentation. There are still many other effects which can be simulated in the generative model phase and even the effects which are simulated can be simulated in a different way.

Compared to other approaches, SynthSeg required a much smaller input training set to achieve comparable performance. This fact is highly important because it minimizes the cost of training by minimizing the labeling labor.

Disadvantages of SynthSeg compared to other methods

Although very positively introduced and presented in the paper, certain drawbacks of SynthSeg have to be mentioned and discussed. First, the segmentation model is only exposed to the artificially modified data, so that the model never sees real data. One can never fully assure which patterns in the data the model indeed learned. Therefore, this can be a big issue for proving the robustness of the model, especially in the medical field, where it is of great importance.

Second, all the effects simulated during the generative model phase are simulated in an artificial way. The results presented indeed show that the way the simulations of the effects are conducted improved the dice score. However, artificial modeling can never completely mimic the real effects happening and there will always be space for improvement.

Finally, a lot of different parameters are included in the generative model. Each of them is simulated from a specific predefined range. Each range has to be carefully fine-tuned, and there is always the potential issue that the resulting data distribution is too wide. In other words, even though it is beneficial for the model to be exposed to wider data distribution, making it too wide can negatively affect performance.

The debatable statement mentioned in the paper and the conclusion

For the very end, one statement mentioned deserves further attention: “Augmenting the data beyond realism often leads to better generalization”. The authors mentioned it as a converging evidence meaning that it has been increasingly confirmed in recent time, citing a couple of different papers. However, as discussed earlier, especially for the medical field, it should always be stressed for the models to be exposed to real data as much as possible because only that can ensure sufficient robustness in general cases.

Overall, the paper pointed out the importance of utilizing a domain randomization strategy for segmentation purposes in the medical field. The method has indeed some disadvantages which are discussed in this blog post, but it is certainly a great encouragement for further work and improvement in a similar direction.

References

[1] Billot, B., Greve, D. N., Puonti, O., Thielscher, A., Van Leemput, K., Fischl, B., ... & Iglesias, J. E. (2023). SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining. Medical image analysis, 86, 102789.

[2] Jindal, S. K., Banerjee, S., Patra, R., & Paul, A. (2022). Deep learning-based brain malignant neoplasm classification using MRI image segmentation assisted by bias field correction and histogram equalization. In Brain Tumor MRI Image Segmentation Using Deep Learning Techniques (pp. 135-161). Academic Press.

[3] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing.

[4] Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19 (pp. 424-432). Springer International Publishing.

[5] Despotović, I., Goossens, B., & Philips, W. (2015). MRI segmentation of the human brain: challenges, methods, and applications. Computational and mathematical methods in medicine, 2015.

[6] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017, September). Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 23-30). IEEE.

[7] Tsai, Y. H. H., Yeh, Y. R., & Wang, Y. C. F. (2016). Learning cross-domain landmarks for heterogeneous domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5081-5090).

[8] https://www.mathworks.com/help/medical-imaging/ug/Brain-MRI-Segmentation-Using-Trained-3-D-U-Net.html (last accessed: 27.6.2023)

Key chatGPT prompts:

"I am writing a literature review on state-of-the-art methods for segmentation of brain MRI scans. Can you describe the best methods to tackle this problem. The style of the text should be academic, you should act as an expert in this field. Write around one whole page"

Seitenhierarchie