Semi/self-supervised Methods for Vessel Segmentation

Introduction

Vessel segmentation is a useful technique for diagnosing diseases and ensuring the efficiency and safety of surgeries [8, 9]. It can be applied to retinal, brain, liver, or coronary images and is beneficial to all those branches of medicine. Segmentation is a widely studied problem with many applications and subtasks. However, vessel segmentation requires particular vessel-specific techniques to work correctly. Hence, it is hard to transfer popular state-of-the-art algorithms to the problem of vessel segmentation. One of the most important issues is that many state-of-the-art segmentation algorithms are built on the paradigm of supervised learning that requires a lot of labeled data. Acquiring pixel-wise manual vessel segmentation is a very time-consuming task that requires an expert radiologist as an annotator. Due to that problem, very little labeled vessel segmentation data is available, which makes supervised learning an unsuitable solution to the problem. Additional hardships of the vessel segmentation include the curvilinear structure of the veins, discontinuity of the vessels, low contrast between capillaries and the background, and vessel bifurcation. All of those attributes make vessel segmentation a challenging problem. This blog post discusses how the problem of vessel segmentation can be approached through semi-supervised and self-supervised learning algorithms.

Vessel segmentation is applicable to many imaging modalities [5].

Semi-Supervised Methods

One of the machine learning paradigms capable of extracting information from unlabeled data is semi-supervised learning. Semi-supervised learning is a combination of supervised and unsupervised learning that uses a small number of labeled data and a large number of unlabeled data for training. Semi-supervised learning can be split in the following categories:

generative methods,
consistency regularization methods,
graph based methods,
pseudo-labeling methods,
hybrid methods [16].

My survey on the semi-supervised methods for vessel segmentation showed that out of the methods mentioned above, three are widely used in vessel segmentation. Those are generative models, consistency regularization methods, and pseudo-labeling methods. Generative models utilize Generative Adversarial Networks (GANs) [18]. The architecture of GANs is slightly modified to take advantage of the power of semi-supervised learning. One of the modifications could be to simultaneously train unsupervised discriminator, supervised discriminator, and a generator model [21]. Pseudo-labeling and consistency regularization methods are discussed in more detail later. The figure below shows all the papers I included in my survey, with the ones in bold being explained more thoroughly in this blog post.

Pseudo-Labeling Method: Hierarchical Deep Network with Uncertainty-aware Semi-supervised Learning for Vessel Segmentation

Pseudo-labeling methods first train the network on the sample of labeled data available for training. This trained network is then used to generate labels for the unlabeled data and add the generated labels to the ground truth set for training. These generated labels are called pseudo-labels. The model is later retrained with the newly generated labels to improve the performance. The issue with pseudo-labeling methods is the quality of the generated pseudo-labels. Given the network was only trained on a small number of samples, the generated labels might be of poor quality, and result in worsening the performance of the network. There are methods to prevent the use of poor quality labels in re-training, one of which is presented by C. Li et al. [3].

Hierarchical Deep Network with Uncertainty-aware Semi-supervised Learning for Vessel Segmentation [3] undertakes the problem of both whole vessel and subtype vessel segmentation using pseudo-labeling. Subtype vessel segmentation separates, for example, arteries and veins in retinal images and portal and hepatic vessels in liver images. A U-Net [17] like architecture was used as a backbone of the network. The overall architecture is shown in the image below.

Spatial Activation

A part of this network that is especially interesting for the problem of vessel segmentation is the spatial activation module. As mentioned before, vessel segmentation techniques struggle with low contrast between the vessels and background. That problem is especially problematic for capillary vessels that are crucial for the subtype vessel segmentation task. One of the methods to improve the contrast is spatial activation module introduced by C. Li et al. [3]. It uses an activation function to improve the contrast through creating an activation map.

The activation map, shown in the image above, is centered at 0.5 because capillary vessels have a pixel value around 0.5. With that, the values corresponding to capillary vessels are enhanced. The other values, corresponding to thick vessels or background, remain unchanged. Through spatial activation, it becomes easier to recognize capillary vessels for the network, which is helpful for the problem of subtype vessel segmentation.

Uncertainty-aware Pseudo-labelings

Pseudo-labeling methods are a popular approach in semi-supervised learning. However, simply generating new labels from the network trained on a small sample of labeled data can result in poor quality pseudo-labels. To deal with that issue, the authors of [3] introduce the idea of uncertainty-aware pseudo-labeling based on Bayesian networks [22].

At training, the prior network is trained together with the posterior network. Prior and posterior networks are typical elements of Bayesian networks architecture [22]. Kullback-Leibler divergence loss is used to make the prior network generate data similar in distribution to the posterior network. After training, the prior network is supposed to approximate the distribution of the posterior network.

The pseudo-labels used for the re-training of the network are generated during the sampling process. The final pseudo-label is the mean of the generated samples, outputted together with its uncertainty measure. Only pseudo-labels with uncertainty above a set threshold are used for the retraining of the network. Through this approach, the network can learn only from good quality pseudo-labels and avoid misleading information from low-quality ones.

Self-Supervised Methods

Self-supervised learning, similarly to unsupervised learning, only uses unlabeled data. In comparison to unsupervised learning, the power of self-supervised learning comes from its ability to extract and utilize data related information. Such information could be another modality or another form of the inputs. The main idea of self-supervised learning is the use of a pretext task in order to extract information from unlabeled data. Different forms of pretext tasks could include color or geometrical transformations. Self-supervised methods could be split in three main categories:

generative,
contrastive,
generative-contrastive [6].

The generative methods aim to reconstruct the input through an encoder-decoder architecture and are trained on the reconstruction loss. Contrastive methods train an encoder to measure similarity between the two inputs in the representation space. Generative-contrastive methods utilize GANs for their learning. All those methods have been applied to the problem of vessel segmentation, with generative methods being used the most.

Generative Method: LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation

LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation [2], LIFE for short, is an example of a generative self-supervised method. This network was tested on OCT-A data and focuses on vessel segmentation for that modality. One of the issues of OCT-A is speckle noise. Small capillaries have low intensity and are therefore hard to distinguish from the speckle noise. Inspired by Joint Label Fusion [19], LIFE implemented Local Intensity Fusion (LIF) of OCT-A slices. LIF is a fusion between the 2D slices of a 3D OCT-A volume that enhances the intensity of the vessels in OCT-A. The issue of LIF, and its contrast-enhanced version CE-LIF, is the introduction of phantom vessels. Vessels that were not present in the ground truth but are present in the LIF and CE-LIF. Due to that problem, LIF and CE-LIF cannot be used in practice as they can produce misleading information.

Cross-Modality Feature Extraction: LIFE

LIFE was inspired by a concept introduced by Y. Liu et al. [20]. It used two OCT-A images of the same retina coming from two different devices to get a full-resolution latent space representing the common structures (vascularity) between the two images. Using two OCT-A machines to extract an image of one retina is not practical and not applicable to real-world situations. Instead of using two OCT-A images, authors of LIFE used the OCT-A image of the retina together with its LIF.

.

The overall architecture is composed of an U-Net [17] like denoising network (Dn-Net), segmentation network (Seg-Net), and synthesis network (Syn-Net). The input to the network is 2D en-face OCT-A image. We call that image M₁. The segmentation network can be regarded as an encoder that generates a full resolution latent space L₁₂.

L₁₂ = f_e(M₁)

The latent space is then inputted into the synthesis network to generate a reconstruction of the M₁’s LIF, which we call M₂.

M₂' = f_d(L₁₂)

The network is trained to make a good quality reconstruction of M₂. If M₂ is well reconstructed, we can consider the latent space L₁₂ as a union between the M₁ and M₂, which means we receive a good contrast image without the phantom vessels. The latent space of the trained network is binarized to receive the final vessel segmentation. As we can see, LIFE is a generative self-supervised method, with a training goal to receive a good reconstruction of the LIF of the 2D OCT-A en-face image.

Can we combine self- and semi- supervised techniques?: Dual-consistency semi-supervision combined with self-supervision for vessel segmentation in retinal OCTA images

One of the papers that particularly drew my attention was the Dual-consistency semi-supervision combined with self-supervision for vessel segmentation in retinal OCTA images (DCSS) [1]. DCSS combines the consistency regularization method of semi-supervised learning with a task of Jigsaw puzzle used in self-supervised learning.

Jigsaw Puzzle: Multi-scale puzzle subtask

The pretext tasks in self-supervised learning can extract meaningful information from unlabeled data. Inspired by that knowledge, DCSS includes a Jigsaw like multi-scale puzzle subtask. The puzzle subtask consists of rotations and flips. The image is divided into equal pieces, either of size 2x2 or 3x3. A classification network is being trained on the puzzle that predicts which permutation was applied. The subtask is implemented in a two scale manner to consider the continuity of the vessels. Using a single scale puzzle would mean that vessels are cut at the same location, which would miss the information regarding the continuity of blood vessels. Through the usage of a 2x2 and 3x3 scales, it is ensured that the vessels are cut at different positions, helping the network understand the continuity of the vessels at the edges of the tiles.

Dual-Consistency Regularization

The goal of consistency regularization methods is to encourage the network to produce similar outputs for the same input under different perturbations. DCSS includes double-consistency, feature and data based. The perturbations of the input are obtained through a multi-scale puzzle. The perturbation transformation is inversed before the auxiliary decoder for feature consistency and after the auxiliary decoder for data consistency. The consistency regularization is applied only to unlabeled data. Some of the reasons to use dual-consistency is to avoid overfitting and reduce the influence of the speckle noise. The results are obtained from the main decoder after the network is trained. The full architecture of the network is shown below.

Results

All the discussed networks are evaluated on different dataset. That makes it difficult to directly compare the performance of the networks. Due to that reason, I am going to discuss the performance of each of the networks separately. Below, I summarized the overall evaluation process of each of the discussed models with the information provided by the authors.

	Dual-consistency semi-supervision combined with self-supervision for vessel segmentation in retinal OCTA images [1]	LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation [2]	Hierarchical Deep Network with Uncertainty-aware Semi-supervised Learning for Vessel Segmentation [3]
Evaluation Metrics		True Positive Rate False Positive Rate Accuracy Dice
Datasets	ROSE-1 Private dataset of human OCT-A	Private datasets of human and zebrafish retinal OCT-A	AV-DRIVE HRF INSPIRE-AVR IRCAD
Ablation Experiments?	Yes	No	Yes
Comparison with other methods?	Yes	No	Yes

Results: Hierarchical Deep Network with Uncertainty-aware Semi-supervised Learning for Vessel Segmentation

The model introduced in [3] was evaluated for both retinal and liver images. One of the interesting experiments in terms of semi-supervision and pseudo-labeling was a comparative study evaluating the performance of the uncertainty-estimation. In the study the network was trained on labeled AV-DRIVE, pseudo-labels for retraining were generated on HRF dataset, and testing was conducted on INSPIRE dataset. The co-training used annotated AV-DRIVE and HRF for training. In the table below, the vanilla annotation stands for pseudo-labels generated without uncertainty estimation. MC Dropout means that the uncertainty estimation was performed using Monte Carlo Dropout, and the “proposed” means the model introduced in the paper.

One can see that co-training performed the best. It performed better than supervised due to an augmented data size (it was trained on two datasets, while supervised was only trained on one). The proposed network performs better than supervised in terms of accuracy and sensitivity. We can see that generally, it performs better than vanilla annotation and Monte Carlo dropout. However, it performs worse than co-training, meaning the generated labels are not as good as a manual annotation.

The network was evaluated not only on retinal images, but also on 3D liver Computed Tomography (CT) images. The performance of the proposed model was compared with other state-of-the-art networks. Only some of them were capable of performing a subtype hepatic/portal vessel segmentation. Hence, the gaps in the table.

Through the results, we can see that the proposed method scored the best performance overall. Even though it didn’t achieve the highest scores for every metric, we can see that the proposed network is performing best on average compared to the other methods.

Results: LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation

The evaluation of LIFE is one of the weak parts of the paper. The method was not compared with any other state-of-the-art network and no ablation studies were performed. That makes it very hard to evaluate its performance. Human and zebrafish OCT-A datasets were used to test the network. The performance was compared with traditional methods, like Frangi’s multiscale vesselness filter, optimally oriented flux (OOF), k-means clustering, and Otsu thresholding.

In the figure above, we can see that LIFE performed much better than the other methods when it comes to smaller branches of the vessels. Through a qualitative evaluation, we can also see that generally LIFE performed better than the traditional methods.

The lack of evaluation in comparison with other models makes it really hard to evaluate LIFE. It is an interesting method for vessel segmentation, but without more extensive experiments it is very hard to rate its performance.

Results: Dual-consistency semi-supervision combined with self-supervision for vessel segmentation in retinal OCTA images

DCSS [1] model provided an extensive evaluation section. Below, we can see an ablation study. F stands for dual consistency, J2 for Jigsaw puzzle of scale 2x2, and J3 for scale 3x3. We can see that in nearly all the metrics, combination of all three techniques results in the best performance, confirming the architectural choices presented in DCSS.

The model was also compared with a number of state-of-the-art networks. The evaluation was based on the ROSE-1 dataset. The table below shows that DCSS scored a great performance in comparison with other models. The method that accounted for a slightly better performance was OCTA-Net, which is a supervised method. DCSS outperformed other semi-supervised networks.

My Review

Vessel segmentation is a challenging task owing to a number of reasons. As vessel segmentation suffers from the lack of annotated data, improvements in self- and semi-supervised methods can significantly increase the quality of the network-based segmentations. The papers I discussed show a number of strengths and weaknesses. I find DCSS to perform best and address the problems of vessel segmentation in the best manner compared to the other papers. The multi-scale puzzle assures the continuity of the vessels is preserved while the dual-consistency helps to lower the influence of the noise. DCSS does not require any extensive data pre-processing, which is the case in [3]. It also shows improvement in the performance compared to the state-of-the-art methods. I am curious to see whether that network could be transferable to other organs/modalities. We could see that LIFE could be applicable to zebrafish data and I find the between species transferability to be a big benefit of that model. The image enhancement generated in the form of the latent space could be used as input to vessel segmentation networks. In the paper, that latent space was simply binarized. I wonder if the performance could be improved if more sophisticated segmentation techniques were applied to it. The technique of spatial activation presented in [3] could be a great addition to the modalities where it is hard to distinguish the vessels, especially capillaires, from the background.

Overcoming the challenges of vessel segmentation is not easy. With tasks and techniques specifically designed for the vessels and their particular curvilinear structure, models can achieve a great performance on that challenging task. As all of the models discussed in the post are utilizing a U-Net like architecture, I think a direction for the future work could be to implement a vessel specific changes into the architecture of U-Net itself. One of the ways, which was implemented in supervised learning methods, could be to change the kernel sizes to account for the curvilinear structure. Specifically in [5], they used 3x1 and 1x3 kernels. It is also important to make sure that the networks can be easily transferable to other imaging modalities. Vessels are a similar structure in any modality and the transferability of the networks should be important while designing vessel segmentation networks.

References

[1] Z. Chen, Y. Xiong, H. Wei, R. Zhao, X. Duan, and H. Shen, “Dual-consistency semi-supervision combined with self-supervision for vessel segmentation in retinal OCTA images,” Biomed. Opt. Express, vol. 13, no. 5, pp. 2824–2834, 2022.
[2] D. Hu, C. Cui, H. Li, K. E. Larson, Y. K. Tao, and I. Oguz, “LIFE: A generalizable autodidactic pipeline for 3D OCT-A vessel segmentation,” Med. Image Comput. Comput. Assist. Interv., vol. 12901, pp. 514–524, 2021.
[3] C. Li et al., “Hierarchical deep network with uncertainty-aware semi-supervised learning for vessel segmentation,” Neural Comput. Appl., vol. 34, no. 4, pp. 3151–3164, 2022.
[4] X. Xu, M. C. Nguyen, Y. Yazici, K. Lu, H. Min, and C.-S. Foo, “SemiCurv: Semi-Supervised Curvilinear Structure Segmentation,” arXiv [cs.CV], 2022.
[5] L. Mou et al., “CS2-Net: Deep learning segmentation of curvilinear structures in medical imaging,” Med. Image Anal., vol. 67, no. 101874, p. 101874, 2021.
[6] X. Liu et al., “Self-supervised learning: Generative or contrastive,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2021.
[7] Y. Ma et al., “Self-supervised vessel segmentation via adversarial learning,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[8] F. Zhao et al., “Semi-supervised cerebrovascular segmentation by hierarchical convolutional neural network,” IEEE Access, vol. 6, pp. 67841–67852, 2018.
[9] M.-C. Xu et al., “Learning morphological feature perturbations for calibrated semi-supervised segmentation,” arXiv [cs.CV], 2022.
[10] J. Hou, X. Ding, and J. D. Deng, “Semi-supervised semantic segmentation of vessel images using leaking perturbations,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022.
[11] Y. Zhou, W. Chang, J. Song, H. Guo, J. Wang, and Y. Chen, “Semi-supervised deep learning of vessel segmentation in coronary angiography,” in 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), 2021.
[12] S. Chatterjee et al., “DS6, deformation-aware semi-supervised learning: Application to small vessel segmentation with noisy training data,” arXiv [eess.IV], 2020.
[13] J. Kukačka, A. Zenz, M. Kollovieh, D. Jüstel, and V. Ntziachristos, “Self-supervised learning from unlabeled fundus photographs improves segmentation of the retina,” arXiv [eess.IV], 2021.
[14] M. Kraft, D. Pieczyński, and K. ‘kris’ Siemionow, “Overcoming data scarcity for coronary vessel segmentation through self-supervised pre-training,” in Neural Information Processing, Cham: Springer International Publishing, 2021, pp. 369–378.
[15] A. S. Hervella, J. Rouco, J. Novo, and M. Ortega, “Self-supervised deep learning for retinal vessel segmentation using automatically generated labels from multimodal data,” in 2019 International Joint Conference on Neural Networks (IJCNN), 2019.
[16] X. Yang, Z. Song, I. King, and Z. Xu, “A survey on deep semi-supervised learning,” arXiv [cs.LG], 2021.
[17] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2015, pp. 234–241.
[18] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020.
[19] H. Wang, J. W. Suh, S. R. Das, J. B. Pluta, C. Craige, and P. A. Yushkevich, “Multi-atlas segmentation with joint label fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 3, pp. 611–623, 2013.
[20] Y. Liu et al., “Variational intensity cross channel encoder for unsupervised vessel segmentation on OCT angiography,” in Medical Imaging 2020: Image Processing, 2020.
[21] J. Brownlee, “How to Implement a Semi-Supervised GAN (SGAN) From Scratch in Keras,” Machine Learning Mastery. .
[22] A. Kendall and Y. Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?,” arXiv [cs.CV], 2017.

Seitenhierarchie