Outline

  • Introduction
  • Methodology
  • Results
  • Conclusion
  • Review
  • References

Introduction

Computer-aided medical procedures tasks like tracking, detection, navigation, or others rely on the segmentation of organs. While organs segmentation is an important task in the medical field however, segmentation is done manually by experts. The segmentation task is time consuming and the amount of medical data is huge. With the emerging research in deep learning field, the automation of segmentation became viable. However, some problems remain with this approach when it comes to medical data. High precision is required, and the similarity between organs and the mis-classifications of boundaries has a limit for tolerance.

There are different models used for segmentation including aggregation of 2D CNNs, 3D CNNs, and other models that rely on geometrical information. As an additional step in order to improve results, refinement strategies are introduced. Refinement steps are introduced in the middle of the segmentation process, or at the end as a post-processing step. Two examples of refinement approaches include Conditional Random Fields (CRF) and uncertainty analysis.

In this approach, refinement is done as a post-processing step. The idea consists of a number of main components including, using a base CNN, uncertainty analysis using Monte Carlo Dropout method (MCDO), semi-labelled graph construction, and Graph Convolutional Network (GCN).

Methodology

Target is to add a post-processing step to refine the segmentation of the organs. The approach suggests training a GCN for the refinement step, assuming no ground truth is available.

We have a trained CNN on the input images, which provides segmentation predictions for the input. Then we measure the uncertainty of these predictions to find which elements the model is uncertain of. The uncertainty is approximated using Monte Carlo Dropout (MCDO) method. The network uses dropout layers, providing an easy post-processing integration and is used to approximate the output of a Bayesian neural network. The output of the MCDO analysis provides a binary volume. The binary volume represents voxels with high confidence or high uncertainty.

We get the model expectation which is used to approximate the probability. The probability is used to compute the entropy. The entropy is uncertainty of the model. The uncertainty is used to define incorrect elements. This step defines a binary output using a threshold. The binary mask indicates voxels with high uncertainty/confidence.

The voxels with high confidence in this map are used to generate a semi-labelled graph. The graph does not include all the nodes since some nodes are irrelevant for the refinement process. These nodes are selected using the entropy and the expectation. The resulting region is called Region of interest (ROI), which is defined in order to eliminate the irrelevant nodes or voxels from the process. This is important to reduce the memory requirements. The ROI is chosen according to the targeted anatomy or geometry of the organ.Each node contains the intensity, expectation, and entropy.

One way of setting up connectivity between nodes is using the nearest neighborhood scheme. This scheme introduces two problems. First problem is the lack of global information between nodes. Second problem lies in the low connectivity between nodes with high confidence and high uncertainty. This is because the boundary of the region of high uncertainty contains the only nodes that connect to the high confidence voxels. This problem hinders the transmission of information. On the other hand a fully connected graph approach will consume a huge amount of memory. The proposed solution offers a middle ground. Each node is connected to its 6 perpendicular nodes, in addition to 16 random nodes. This increases the possibility of having a connection between uncertain and high confidence regions, which means connection between labelled and unlabelled nodes. Additive weights between the nodes is used in order to use the similar and dissimilar connections in the learning process.

After applying these steps on the input, we get a semi-labelled graph which is used to train a semi-supervised GCN to refine the segmentation.



Figure 1: a) Refinement steps. b) Voxels connectivity.

Results

The method was validated using two datasets with different anatomical structures. The effect of the training samples was tested as well as uncertainty threshold.

The testing was performed on two CT datasets, NIH pancreas, and MSD-spleen. 65 random samples were selected from NIH dataset, 45 for training, and 20 for testing. 35 samples were selected from MSD, 26 for training, and 9 for testing. The labels in MSD were unified into two labels only, background and foreground, as we're discussing only binary segmentation.

Testing the approach was done using a 2D U-net CNN model with dropout layers at the end of each convolution block. The dropout is needed in order to use MCDO analysis. The prediction was performed on axial slices then the output was stacked. In order to reduce the false positives the largest connected component in the prediction is computed. The GCN is a two layered network with 32 features in the hidden layers, a single output node, trained for 200 epochs with 1e-2 as a learning rate, binary loss entropy, and Adam optimizer.The same base CNN model is used for CRF refinement and GCN refinement. Both refinement methods used the same ROI.

GCN refinement produced better results than conditional random field (CRF) refinement and the standard CNN without any refinement, as shown in Table 1, Figure 2 and Figure 3. The GCN predictions is used to replace the CNN's as it provided better results than replacing only uncertain predictions.


Figure 2: CNN predictions vs GCN refinement, for pancreas segmentation. Green color represents true positives. Red represents false positives. White represents false negatives.


Figure 3: CNN predictions vs GCN refinement, for spleen segmentation. Green color represents true positives. Red represents false positives. White represents false negatives.


Table 1: Average DICE score performance percentage of the base line CNN, using CRF refinement, and after using GCN refinement, for the two datasets.


TaskCNNCRF refinementGCN refinement
Pancreas76.9 ± 6.677.2 ± 6.577.8 ± 6.3
Spleen93.2 ± 2.593.4 ± 2.695.1 ± 1.3



The approach was tested using 10 samples for training for NIH, and 9 samples for Spleen. This reduction in the number of training samples resulted in worse results. However, the performance is still better than the other 2 options, as shown in Table 2.


Table 2: Average DICE score performance percentage, for smaller datasets.


TaskCNNCRF refinementGCN refinement
Pancreas-1052.10 ± 22.61

52.20 ± 22.62

54.50 ± 22.15

Spleen-978.80 ± 28.4078.80 ± 28.4081.15 ± 28.90


Uncertainty threshold optimal value depended insignificantly on the organ anatomy as well as it depended on the dataset and its number of samples. For Spleen, larger values caused better results. However, for Pancreas, it was not the case. Additionally, the threshold value decides the number of nodes selected. This means that intermediate values are preferred to avoid high memory consumption. The performance differences is shown in Table 3.


Table 3: Average DICE score performance percentage, for different thresholds values.

Task

GCN

τ = 1e − 3

GCN

τ = 0.3

GCN

τ = 0.5

GCN

τ = 0.8

GCN

τ = 0.999

Pancreas

77.71 ± 6.3

77.79 ± 6.4

77.77 ± 6.3

77.81 ± 6.3

77.79 ± 6.3

Pancreas-1054.55 ± 22.154.32 ± 22.154.15 ± 22.253.91 ± 22.453.14 ± 22.9
Spleen95.01 ± 1.594.92 ± 1.494.98 ± 1.494.97 ± 1.495.07 ± 1.3
Spleen-980.91 ± 28.880.94 ± 28.980.94 ± 28.880.98 ± 28.981.15 ± 28.9


The refinement output is similar to the expectation, as shown in Figure 4. As mentioned before the expectation is part of the node, and contributes to the edges weights. However, if the expectation contains artifacts, false positives might get generated.


Figure 4: Graph constructing features: predictions, expectation, entropy. Green color represents true positives. Red represents false positives. White represents false negatives. Brighter intensities represent higher values.

Conclusion

The proposed approach uses a GCN on a semi-supervised Graph to provide a post processing step for segmentation refinement. The method outperformed its counterparts, including the base CNN only without refinement, and CNN then CRF refinement. The approach uses a 2D model, MCDO method for uncertainty analysis, and ROI in order to limit nodes number.

Further research could be directed into using 3D models instead. This will present memory constraints and addition to other efforts to adapt to the differences in the architecture. However, the proposed approach could be used with any CNN that uses MCDO to obtain uncertainty. Additionally, expectation could be used for uncertainty analysis instead of MCDO analysis. Also, inter/intra-observer variability uncertainty measures could be investigated.

Review

As discussed, there exists different segmentation models that could be used for anatomical segmentation. The background and introduction was nicely established, showing different approaches for segmentation, and the existing problem with that. Additionally, it showed the importance of adding a refinement step, especially for the medical field.

The selection of the number of random connections (k=16) was not tested against other values. Also the performance of the approach using random connections was not compared with the performance of a fully connected graph. The approach uses an aggregation over a 2D U-net model instead of 3D U-net, however it was mentioned that further research could be done regarding this part. Additionally, the results show that the number for training samples affect the results, which means larger datasets could be tested.

The region of interest part could have more fine tuning and testing. It was mentioned why some values where chosen, but more validation and other values could be tested. Overall the paper introduced the idea well and its importance for current research.

References

Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the 31th International Conference on Neural Information Processing Systems (NIPS), 2017.

Philip Krähenbühl and Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), 2011.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015.

Markus A Degel, Nassir Navab, and Shadi Albarqouni. Domain and geometry agnostic cnns for left atrium segmentation in 3d ultrasound. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 630–637. Springer, 2018.

Holger R. Roth, Amal Farag, Evrim B. Turkbey, Le Lu, Jiamin Liu, and  Ronald M. Summers. Data from pancreas-ct. the cancer imaging archive. http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU, 2016.

Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby, Paul Koppel, Stephen Moore, Stanley Phillips, David Maffitt, Michael Pringle, Lawrence Tarbox, and Fred Prior. The cancer imaging archive (tcia): Maintaining and operating a public information repository. Journal of Digital Imaging, 2013.

Amber L. Simpson, Michela Antonelli, Spyridon Bakas, Michel Bilello, Keyvan Farahani, Bram van Ginneken, Annette Kopp-Schneider, Bennett A. Landman, Geert Litjens, Bjo- ern Menze, Olaf Ronneberger, Ronald M. Summers, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc  Gollub, Jennifer Golia-Pernicka, Stephan H. Heckers, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Eugene Vorontsov, Lena Maier-Hein, and M. Jorge Cardoso. A large annotated medical image dataset for the development and evaluation of segmentation algorithms.  arXiv:1902.09063, 2019.

  • Keine Stichwörter