Introduction

The classification of pixels to generate semantic masks for foreground-background separation in the image domain is actually the fundamental principle lying behind the segmentation task. This general idea was extended and enhanced in the context of deep learning. To improve the quality of these segmentation masks, many different ideas, such as skip connections[1], atrous convolution[2], and conditional random fields[3], were introduced as well as great variety of neural network architectures have been developed. Although quite satisfactory results would be achieved with these approaches, they tend lo lack topological understanding of image context and thereby failing to preserve the continuity in segmentation masks, which is a vital limitation mostly observed in medical imaging.


The Role of Continuity in Segmentation

In the evaluation step of segmentation models, it is dominantly presented how they have a good performance in terms of standard pixel-wise metrics like accuracy, sensitivity, and precision, but this does not guarantee that diferent ligaments and sections of target objects are segmented in a connected form, instead just focusing on independent or overlap-based consideration of pixels. For example, the number of correctly-classified pixels in the foreground region can be high, but due to just a few misclassified pixels, a vessel branch may be segmented from its tiny arterioles discontinuously. This kind of situation is not specialized for just vessels or arteries; it is also observable in segmentation of neurons and closely-located gland cells in histopathological images. While the differentiation of distinct neuron membranes in electron microscopy images by segmentation is quite common practice for neural circuit reconstruction, the categorization of gland forms for cancer diagnosis by morphological size in their segmentation masks becomes also so widespread. The failure in the preservation of continuity for any of these cases, shown in Figure 1, gives rise to the violation of anatomical consistency to be used in downstream clinical decision making processes. Hence, continuity and topology have an important role on segmentation of curvilinear, adjacent, and small-scale anatomical structures.


Related Papers

Being quite capable of predicting well-structured and fine-grained segmentation masks, neural networks exhibit a restricted pattern in the perception of whether different segmented parts need to be connected or disconnected. The main cause of such an incapacity is generally expressed by 3 different factors:

  1. Feature representations extracted by the network are not continuity-aware.
  2. General architecture of deep learning models is not topology-oriented.
  3. Loss functions are unable to evaluate continuity and thereby penalizing the network.

To compensate for these deficiencies, 3 different solutions are generally proposed in the literature, and categorization of methodologies under these solutions, in fact, provides a more comprehensive perspective for the assessment and examination of the latest continuity-based segmentation techniques in deep learning field, which is illustrated in Figure 2. 


A. DConnNet

Main Idea and Approach

DConnNet[7] is actually an encoder-decoder architecture which aims to utilize directional context in disentangled feature space for connectivity-based segmentation. To achieve that, it extracts directional (continuity-aware) features with the help of a topological input prior generated by the supervision of connectivity mask used as a label, and then disentangles them from latent space to be fused into decoder part. At this point, disentanglement of directional and categorical subspaces coupled in latent representation enhances the utilization of extracted features in decoder part for connectivity-based prediction task; that is why, this process is explicitly carried out between encoder and decoder modules. To also enforce the continuity and thereby enrich continuity-based features, 3 main components are introduced with this architecture:

  1. Directional input prior
  2. Connectivity mask instead of segmentation mask
  3. Connectivity-modeling loss function

Methodology

General methodology of DConnNet[7] relies on the union of 3 main modules, as illustrated in Figure 3.

  1. ResNet Encoder 
  2. Sub-path Direction Excitation Unit 
  3. Interactive Feature-space Decoder

ResNet[8] encoder, at first, generates feature maps in which directional and categorical features are coupled. Then, this feature representation is passed into sub-path direction excitation unit for disentanglement process. Here, it is upsampled and supervised to learn connectivity masks used as a label in loss calculation to extract directional input prior. This prior map and its vectoral component generated by global average pooling as a scaling factor go through channel-wise slicing to be split into 8 segments. In final step, these prior segments apply to pixel-wise and channel-wise attention modules[9], which results in disentagled feature context for decoder module. Interactive feature-scape decoder work like top-down interactive flows of feature blocks and space blocks. In each flow, categorical features are refined and directional features are highlighted and improved more and more. 

Experiments and Datasets

Experimental process for the evaluation of DConnNet[7] architecture is split into 2 different parts, which are large-scale medical segmentation quality, and the preservation of continuity in generated segmentation maps. Since the network is not specifically designed for particular anatomical structure; instead, it has a more general concept, it is evaluated on 3 different medical benchmark datasets, which are illustrated in Figure 4:

  1. Retouch[10] - Retinal Fluid 
  2. ISIC 2018[11] - Skin-Lesion 
  3. CHASEDB1[12] - Retinal Fluid 

In terms of their usage purposes, standard pixel-wise analysis is carried out on Retouch[10] and ISIC 2018[11], for which there are no continuity-based or topological measurements. On the other hand, CHASEDB1[12] is specifically dedicated to topological assessment of DConnNet[7] architecture. As an evaluation technique, 3-fold and 5-fold cross validation are used due to inaccessibility to test sets. 

Results

To highlight continuity preserving ability of DConnNet[7] in segmentation, network structure of retinal vessels is targeted, and 6 different baselines enclosing 2 different loss functions designed to account for continuity enforcing penalization are used under the comparison of 5 different evaluation metrics:

  1. Centerline Dice (clDice)
  2. Standard Dice (DSC)
  3. Intersection of Union (IOU)
  4. Betti-0 (β0)
  5. Betti-1 (β1)

While dice similarity coefficient and jacquard index emphasize how much predicted segmentation masks overlap with their respective ground-truths, other three metrics illustrate topological characteristics of those mask predictions. To be more specific, Betti-0 actually counts the number of connected segmentation regions, and as its complementary, clDice inspects how much the centerlines of connected regions from predictions and ground-truths are compatible with each other. At this point, even if segmentation maps achieve to preserve the continuity, they can contain some circular empty holes, which is quantified by Betti-1 for topological consistency. DConnNet[7] tends to outperform all the baselines under the consideration of these aspects, which is presented in the Table 1.

B. TA-Net

Main Idea and Approach

The general concept of continuity in segmentation can be mostly regarded as whether mutually-correlated regions are segmented in a connected way; nonetheless, it is also subject to the isolation and discrimination of irrelevant forms in segmentation maps without giving any sacrifice from their geometry. TA-Net[6], in fact, predicates this notion to provide topology-aware segmentation of gland cells in histopathological images. Cancerogenesis in epithelial surface tends to make the size of glands relatively large and and also deformed in shape; as a result, overlapped and densely-clustered structure emerges. To be able to segment these gland cells and thereby identifying the stage of cancer, TA-Net[6] aims to understand connected and disconnected parts of these malfunctioning tissues by the representation of gland topology. In accordance with this idea, it encodes geometry and deformation of gland cells in feature maps of its encoder-decoder architecture by using medical axis transformation[13].

In medial-axis transformation[13] (also see Figure 5), ground-truth segmentation maps are, at first, skeletonized by morphological thinning, and then the area from segmentation contours to the center of skeleta is occupied with a density field, which results in medial-axis distance maps. TA-Net architecture is specifically enforced to predict this medial-axis map apart from the prediction of segmentation masks. In that way, encoder features are converted to understand topological context.


Methodology

TA-Net[6] architecture is actually an encoder-decoder architecture, as seen in Figure 6. Its SegNet[14] style encoder and 2 parallel decoders are associated with each other by skip connections to have a successful gradient flow and cover low-level details in representation space. The decoders have almost same architecture except for last layer to be used for the prediction of instance and medial axis maps. Medial-axis map predicted by the second decoder is also used to generate a marker map together with semantic mask, all of which are aggregated to calculate topological and instance-based loss function. At this point, the density of gland cells would be highlighted and the mutual interaction of their overlapping borders are clarified in encoder features during the penalization of medial-axis predictions. In that way, the model becomes capable of differentiating between closely located malignant gland cells in segmentation maps.

Experiments and Results

As noticed in its main motivation and approach, TA-Net[6] is actually specialized to histopathological image segmentation. Even if its architecture is built at the top of more general principles in deep learning, medial-axis transformation[13] is specifically devised and integrated into the framework to have better understanding of closely-located and adjacent solid anatomical structures like gland cells. Hence, the experimental process of this architecture was only carried out on H&E stained histological images of 2 different colorectal adenocarcinoma datasets, which are shown in Figure 7:

  1. GlaS: Gland Segmentation in Colon Histology Images Challenge[15]
  2. CRAG: Colorectal Adenocarcinoma Gland[16]

CRAG[16] dataset is composed of 213 images, which are split into 2 different subsets for training and testing parts. While training set contains 173 images, only 40 images are allocated for testing. GlaS[15] is actually a smaller dataset with 165 samples, and its train-test split is approximately equivalent to each other (85 and 80 respectively). The images in both datasets go through 20-times magnification process and they are obtained from whole slide imaging (WSI) technique. 

In results table 2, some baselines and the latest state-of-the-art methodologies are presented to provide a comprehensive and fair comparison. In total, 3 different evaluation metrics are computed, which are F1 score, object-level Dice coefficient, and object-level Hausdorff distance. Depending on these metrics, TA-Net[6] outperforms its counterparts in CRAG[16] dataset, yet only Yan et al.[17] is better than TA-Net in terms of F1-score for GlaS[15] dataset. 

C. Projected Pooling Loss

Main Idea and Approach

Projected pooling loss[18] is actually designed for soft analysis of topological similarity between predicted and ground-truth segmentation map. To achieve that, it makes use of max-pooling operations in both spatial and also volumetric domains. In the measurement of such a topological similarity, it aims to constrain topology in 3D volume segmentation by virtue of axial, sagittal, and coronal axes; hence, it differentiates from classical pixel-wise or just overlap-based loss functions without any consideration of pixels or voxels in the aspects of whether they contribute to whole structure and topology of anatomical parts to satisfy continuous segmentation.

Methodology

Projected pooling loss[18] actually consists of 3 steps depicted in Figure 8. First, the ground-truth and predicted 3D segmentation maps are projected onto axial-coronal-sagittal 2D planes by using 3D max-pooling. While doing that, point-wise kernel is used to preserve respective 2 dimensions of segmentation volume except for projection axis, which is set to be same as volume size. In the second step, those 2D planes go through 2D max pooling with 5 different square kernels, which results in the generation of a connected component map for each kernel as different characterization of topology for projected views. Finally, total projected loss is computed as the difference between aggregated pixel intensities of connected component maps for prediction and ground-truth segmentation maps to reflect topological similarity in spatial extent into one numerical value as final loss. In that way, the continuity of segmented region in volumetric space would be investigated in 3 different projection axes. 

Experiments and Results:

This paper, in fact, does not propose a particular neural network architecture; instead focuses on the development of a new differentiable continuity-enforcing loss function to understand structural topology. Hence, it directly adapts classical 3D U-Net[19] in experimental process, and makes a comparison with soft-dice loss[21] to illustrate how its loss function positively affects the topology and continuity of segmentations. In total, 1 local and 3 different public datasets from Medical Segmentation Decathlon[20] (MSD) are used. While local dataset targets on red-nucleus segmentation in brain images, decathlon datasets are of spleen, heart, and hippocampus segmentation. For training, validation, and testing stages, the data samples were split into 3 subsets, and to avoid any possible information leak, data splitting procedure is carried out in participant level. 

General evaluation scheme followed in the experiments is the investigation of dice loss[21] and projected pooling loss[18] in both voxel level and topological level metrics. It is observed in Table 3 that both loss functions have comparably similar effect on standard segmentation quality even if they slightly vary in some metrics. On the other hand, projected pooling loss[18] tends to significantly reduce both 2D and 3D connected component errors, referring to β0 compared to dice loss.

D. Discrete Morse Theory Loss

Main Idea and Methodology

To enforce the continuity preservation in segmented structures, neural architecture design and provided prior information for the continuous structure of anatomical shapes have an important role, but for the extraction of continuity-aware features and leading the network to concentrate on specifically connectivity, the usage of continuity-enforcing loss functions becomes vital. Discrete morse theory loss[22], called as DMT-Loss, is proposed for this purpose just like projected pooling loss[18], and also aims to enhance critical points that cause possible discontinuities in segmentation maps. 

The main methodology that segmentation networks follow is to predict pixel-wise likelihood map by using soft-max activation, and this map is converted into final predictions by thresholding. The common problem observed in this approach is that predicted likelihood maps contain small pixel errors and noise, which reduce the emphasis of some regions, and thereby making them blurry. In that case, these regions, called as morse structures, are segmented as background, and discontinuities occur. DMT-Loss[22] can identify critical morse areas, extract their potential skeletons, and then enforce higher penalty on them to clarify these errors, which is shown in Figure 9. To achieve that, it considers the discontinuities occurring in likelihood map as saddle regions in a possible terrain function, caused by missing pixels, and then tries to capture these missing pixels by using stable manifolds in the guidance of gradient vector fields.

Experiments and Results

The experimental process of DMT-Loss[22] is designed to be quite comprehensive in a way that it is tested on both medical and also natural 6 different datasets in total, for which segmentation of curvilinear structures are specifically targeted. The evaluation process in these experiments is actually carried out in both volumetric and spatial domains: For 3D case, ISBI13[23] and CREMI[24] datasets are used, whereas the remaining four ones[27, 28, 29, 30] are dedicated to 2D segmentation. In training step, 3-fold cross validation technique is adapted, and overall mean over all folds are reported. 

In results table, it is observed that 4 different baselines are proposed for the comparison with DMT-Loss[22] under 5 different metrics: DIVE[25] and U-Net[1] are two important neural network architectures, which laid the foundation of today’s neural architecture designs together with many fundamental ideas, are trained with cross entropy loss for segmentation. While Mosinska et al.[26] opt for a topology aware loss function for training U-Net[1], TopoLoss[5] introduces a new topological loss function to conduct its training process with DIVE[25]. Among all these configurations, DMT-Loss[22] is specifically tested on 2D and 3D U-Net[1, 19] versions and tend to have the most promising results presented in Table 4. 

Comparison and Discussion

General evaluation and mutual comparison of these methodologies are provided in Table 5:

Subjective Review

The awareness of whether segmented structures should be connected or disconnected is not a self-acquired property for neural network architectures; instead it has to be forced and obliged for segmentation models to have topological understanding of anatomical structures and thereby preserving the continuity. To achieve that, many different approaches and ideas, presented in the Figure 2, are introduced in the literature; nevertheless, the latest continuity-oriented segmentation methodologies mostly focus on the usage of topological priors and continuity-enforcing loss functions to generate continuity-aware feature representations. The main reason why topology-oriented architectures are positioned behind the other 2 options is that it only works for particular anatomical structures. For example, graph neural networks produce quite satisfactory results for the segmentation of retinal vessels and ventral nerve cord boundaries by imitating their network structure based on sophisticated node sampling strategy. However, their applicability to more solid and circular structures like gland cells or organs is controversial. Similarly, TEDS-Net[31] and TPSN[32] are topology oriented architectures to understand persistent homology and then deform its basic imitation (topological prior) into correct segmented structure, which is only suitable for circular forms. This actually leads the literature to come up with more general solutions instead of structure-oriented designs. 

References

  1. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing.
  2. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
  3. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
  4. Shit, S., Paetzold, J. C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., ... & Menze, B. H. (2021). clDice-a novel topology-preserving loss function for tubular structure segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16560-16569).
  5. Hu, X., Li, F., Samaras, D., & Chen, C. (2019). Topology-preserving deep image segmentation. Advances in neural information processing systems, 32.
  6. Wang H, Xian M, & Vakanski A (2022). Ta-net: Topology-aware network for gland segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1556-1564).
  7. Yang, Z., & Farsiu, S. (2023). Directional Connectivity-based Segmentation of Medical Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11525-11535).
  8. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
  9. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3146-3154).
  10. Bogunović, H., Venhuizen, F., Klimscha, S., Apostolopoulos, S., Bab-Hadiashar, A., Bagci, U., ... & Schmidt-Erfurth, U. (2019). RETOUCH: The retinal OCT fluid detection and segmentation benchmark and challenge. IEEE transactions on medical imaging, 38(8), 1858-1874.
  11. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M. E., Dusza, S., Gutman, D., ... & Halpern, A. (2019). Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368.
  12. Fraz, M. M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A. R., Owen, C. G., & Barman, S. A. (2012). An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering, 59(9), 2538-2548.
  13. Blum, H. (1967). A transformation for extracting new descriptions of shape. Models for the perception of speech and visual form, 362-380.
  14. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495.
  15. Sirinukunwattana, K., Pluim, J. P., Chen, H., Qi, X., Heng, P. A., Guo, Y. B., ... & Rajpoot, N. M. (2017). Gland segmentation in colon histology images: The glas challenge contest. Medical image analysis, 35, 489-502.
  16. Graham, S., Chen, H., Gamper, J., Dou, Q., Heng, P. A., Snead, D., ... & Rajpoot, N. (2019). MILD-Net: Minimal information loss dilated network for gland instance segmentation in colon histology images. Medical image analysis, 52, 199-211.
  17. Yan, Z., Yang, X., & Cheng, K. T. (2020). Enabling a single deep learning model for accurate gland instance segmentation: A shape-aware adversarial learning framework. IEEE transactions on medical imaging, 39(6), 2176-2189.
  18. Fu, G., El Jurdi, R., Chougar, L., Dormont, D., Valabregue, R., Lehéricy, S., & Colliot, O. (2023, April). Introducing soft topology constraints in deep learning-based segmentation using projected pooling loss. In Medical Imaging 2023: Image Processing (Vol. 12464, pp. 351-356). SPIE.
  19. Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19 (pp. 424-432). Springer International Publishing.
  20. Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Landman, B. A., ... & Cardoso, M. J. (2022). The medical segmentation decathlon. Nature communications, 13(1), 4128.
  21. Milletari, F., Navab, N., & Ahmadi, S. A. (2016, October). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV) (pp. 565-571). Ieee.
  22. Hu, X., Wang, Y., Fuxin, L., Samaras, D., & Chen, C. (2021). Topology-aware segmentation using discrete Morse theory. arXiv preprint arXiv:2103.09992.
  23. Arganda-Carreras, I., Seung, S. H., Vishwanathan, A., & Berger, D. (2013). SNEMI 3D: 3D Segmentation of Neurites in EM Images.
  24. J. Funke, E. Perlman, S. Turaga, D. Bock, and S. Saalfeld, Cremi Challenge Leaderboard, as of 2017/22/09. (2016). [Online]. Available: https://cremi.org
  25. Fakhry, A., Peng, H., & Ji, S. (2016). Deep models for brain EM image segmentation: novel insights and improved performance. Bioinformatics, 32(15), 2352-2358.
  26. Mosinska, A., Marquez-Neila, P., Koziński, M., & Fua, P. (2018). Beyond the pixel-wise loss for topology-aware delineation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3136-3145).
  27. Zou, Q., Cao, Y., Li, Q., Mao, Q., & Wang, S. (2012). CrackTree: Automatic crack detection from pavement images. Pattern Recognition Letters, 33(3), 227-238.
  28. Mnih, V. (2013). Machine learning for aerial image labeling. University of Toronto (Canada).
  29. Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging, 23(4), 501-509.
  30. Arganda-Carreras, I., Turaga, S. C., Berger, D. R., Cireşan, D., Giusti, A., Gambardella, L. M., ... & Seung, H. S. (2015). Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in neuroanatomy, 9, 142.
  31. Wyburd, M. K., Dinsdale, N. K., Namburete, A. I., & Jenkinson, M. (2021, September). TEDS-Net: enforcing diffeomorphisms in spatial transformers to guarantee topology preservation in segmentations. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 250-260). Cham: Springer International Publishing.
  32. Zhang, H., & Lui, L. M. (2022). Topology-preserving segmentation network: A deep learning segmentation framework for connected component. arXiv preprint arXiv:2202.13331.
  • Keine Stichwörter