Author: Bjarne Sauer

Supervisor: Unbekannter Benutzer (ga87jay)

This blog post presents deep learning (DL) techniques applied on the task of 3D vessel segmentation. 3D vessel segmentation is a computer vision task in the field of medical imaging to identify and extract vessel structures from volumetric data. It involves analyzing 3D medical scans, such as computed tomography (CT) and magnetic resonance imaging (MRI), by grouping voxels to related segments and precisely delineating the boundaries of vessels within the scanned volumes. In the following, we will explore medical applications, highlight the challenges of 3D vessel segmentation and present underlying methodologies employed, and compare latest advancements using deep neural network architectures and novel loss functions.

Medical Applications

The significance of 3D vessel segmentation lies in its ability to assist medical professionals in diagnosing and treating various vascular conditions and diseases. This includes detecting of vascular abnormalities, simulating blood flow dynamics, monitoring the progressing of diseases and planning surgical interventions [7]. Vessel structures of interest are mainly blood vessels, such as coronary (see Figure 2), hepatic or cerebrovascular vessels (see Figure 1), but can also be lymphatic vessels or neural pathways.

One application area for 3D vessel segmentation is the extraction and the modelling of hepatic vessel structures for diagnosis of liver tumors, one of the most deadliest cancer types in the world, as well as preoperative planning for liver resection and transplantation as very promising treatment techniques [6]. In addition to that, 3D vessel segmentation can be used to support the diagnosis and neurosurgery planning for strokes and brain aneurysms (see Figure 1) by identifying and dissolving blockages or bulging within cerebrovascular vessels [18]. For both the diagnosis and the treatment detailed vascular maps or even better 3D models of the morphology and connectivity of the vessels are necessary. Applying segmentation techniques allows for differentiating vessels from surrounding tissue and background noise. Modern approaches use machine learning (ML) algorithms and neural network architectures specially developed for 3D image tasks, which yet must be improved to allow an application in clinical practice and thereby immense progress for society.

(1) (2)

Figure 1: CT scan of human head, cerebrovascular blood vessels highlighted through supply of contrast-enhancing substances, use cases: diagnosis and neurosurgery planning for strokes and brain aneurysms (taken from https://www.hopkinsmedicine.org/health/conditions-and-diseases/brain-aneurysm-4-things-you-need-to-know, (May 30, 2023)

Figure 2: CT scans and extracted models of human heart, coronary vessels highlighted, use cases: diagnosis of coronary arteries anomalies (here hypertrophic cardiomyopathy) (taken from Goo et al. [23, Fig 2])

Challenges

3D vessel segmentation poses several challenges. First of all, vessel structures are highly complex, form a network of many thick and thin branches and can, depending on the anatomic region being deliberated, considerably vary from patient to patient [6][19]. Therefore, the task is challenging as the vessel structures must be accurately traced through numerous branches, convoluted paths and closely packed vessels. This is only reinforced by the fact that medical images such as CT and MRI scans provide low contrast and can have high noise [6]. The administration of contrast-enhancing substances during medical imaging is not universally applied to all patients, and while it presents a partial solution, it does not completely eliminate the associated challenge. Likewise, high noise can hinder accurate delineation and can interrupt the smoothness of vessel structures leading to comprised segmentation results. Additionally, the voxel distribution between vessel structures and background is inherently imbalanced, and particularly there is distribution disparity between edge and non-edge voxels [18]. This is especially important where vessel boundaries might be faint and difficult to distinguish due to noise, limited spatial resolution and partial volume effects. The spatial resolution of the imaging modality may not be high enough to fully capture the details of small vessels. The partial volume effect occurs when a voxel contains a mixture of both vessels and other tissues [22]. Struggling to classify edge voxels correctly, leads to inaccurate segmentation results and missing out on some tiny vessels completely [18]. Lastly, there is a lack of larger data sets with high-quality labels [7][19]. Labelling medical images, especially in 3D, requires medical expertise and is also a time-consuming task. Therefore, data sets usually contain less than 100 samples each and some have incorrect labels too. Thinness and intricate connectivity increases the risk of some voxels containing a vessel structure being overlooked.

Deep Learning Approach

DL approaches play a crucial role in advancing 3D vessel segmentation techniques. DL models are trained to automatically identify and segment vessel structures within volumetric medical images. Convolutional Neural Networks (CNNs) are commonly used in deep learning-based vessel segmentation approaches. These networks are designed to process and analyze images by hierarchically learning features at different levels of abstraction. Continued research in deep learning for 3D vessel segmentation focuses on enhancing the performance and efficiency of models, exploring novel network architectures and novel loss functions to address the challenges aforementioned.

Fundamental Neural Network Architecture: U-Net

To accurately segment medical images, a proper neural network (NN) architecture must be chosen. Ronneberg et al. [12] introduced the U-Net architecture. Such a CNN consists of a contracting path (encoder) and a symmetric expanding path (decoder). The contracting path consists of convolutional layers, that extract various features from the input image, and pooling layers, that reduce the spatial dimensions through downsampling operations to capture larger or global context. The expanding path performs upsampling operations to restore the spatial information of the feature map to the original input image size, which allows for precise localization of the pixel-wise predictions in the original input image. Furthermore, skip connections allow for combining low-level with high-level features resulting overall in a fine-grained segmentation. The U-Net architecture has been further improved to segment 3D images, for example with 3D U-Net [2] and V-Net [9]. Adjusted versions of the (3D) U-Net still serve as a backbone architecture for most of the recent approaches applied to (medical) image segmentation.

Loss and Evaluation Functions

There is a multitude of approaches to measure the performance of any DL architecture on an image segmentation task. Commonly used are (variants of) cross-entropy loss, focal loss, Hausdorff distance and Dice score [3]. Cross-entropy calculates the dissimilarity between the predicted probability distribution of each voxel belonging to different classes and the ground truth distribution, represented as binary mask. Focal loss is a modified version of the cross-entropy loss which introduces a modulating factor to down-weight the contribution of well-classified pixels (usually from the majority class), thereby giving more emphasis to hard examples [8]. Hausdorff distance measures the dissimilarity between two sets of points, namely between predicted segmentation and ground truth, by quantifying the maximum distance between points in the one set to their nearest neighbour in the other set [3]. Dice loss is based on the Sørensen-Dice coefficient and computes the similarity of the prediction and the ground truth by calculating the twice the intersection divided by the union, in general allowing for better dealing with imbalanced distributions [9][16].

Figure 3: Shortcomings of traditional Dice score as motivation for clDice: both segmentation results are identical with regard to traditional Dice score. The purple segmentation does not capture the small vessels while segmentation the large vessel very accurately and the red segmentation captures all vessels while being less accurate on the radius of the large vessel. For topology preservation red is evidently preferred.

Given the task of 3D vessel segmentation, one important goal is the preservation of the connectivity of the global network topology. Figure 3 shows that the traditional Dice score does not strengthen preserving connectedness on a network topology. CenterLine Dice (short clDice) as presented by Shit et al. [15] guarantees topology preservation up to homotopy equivalence for binary segmentation by calculating the similarity on the intersection of segmentation masks and their skeleta (see equations 1-3). Using clDice as either the loss function of one’s segmentation network (fully differentiable version soft-clDice) or during evaluation in 3D vessel segmentation enforces connectivity of vessels, preserving the original graph structure and better segmentation of smaller vessel branches.

(1)	$\begin{array}{l}\displaystyle Tprec(S_P,V_L) = \frac{{\left\| S_P \cap V_L \right\|}}{{\left\| S_P \right\|}} \label{(1)}\end{array}$

(2)	$\begin{array}{l}\displaystyle Tspec(S_L,V_P) = \frac{{\left\| S_L \cap V_P \right\|}}{{\left\| S_L \right\|}} \label{eq:Tspec}\end{array}$

(3)	$\begin{array}{l}\displaystyle clDice(V_P,V_L) = 2 \times \frac{{ Tprec(S_P,V_L) \times Tspec(S_L,V_P)}}{{Tprec(S_P,VL) + Tspec(S_L,V_P)}} \label{eq:clDice}\end{array}$

where V_L is ground truth mask, V_P predicted segmentation mask, S_L extracted skeleton from ground truth mask and S_P extracted skeleton from predicted segmentation mask.

Recent Research

Although research and breakthroughs emerge quickly in the field of computer vision and image segmentation, 3D vessel segmentation remains challenging. In this section, we delve into the DL approaches and findings of four recent research papers that have significantly advanced the field of 3D vessel segmentation.

Graph Connectivity Constrained Network

A widely mentioned challenge is the complexity and tininess of vessel structures combined with low contrast in medical images, such as CT and MRI scans, which often leads to discontinuity and therefore incorrectness of the segmentation results [6][7][11]. Li et al. [6] make use of the connectivity prior by combining the backbone U-Net architecture with a graph neural network. Therefore, the overall loss is a combination of the segmentation loss calculated with respect to the convolution operations of the CNN and additionally a connectivity constraints loss (see equation 6). On the one hand, for the segmentation loss Dice loss is applied (see equation 4) and one the other hand the connectivity constraints loss is based on the cross entropy loss over the set of nodes (see equation 5).

(4)	$\begin{array}{l}\displaystyle Loss_{seg} = 1 - \frac{2 \sum_{i=1}^{N} p_i \, g_i + \epsilon}{\sum_{i=1}^{N}p_i + \sum_{i=1}^{N}g_i + \epsilon} \label{eq:Loss_seg}\end{array}$

(5)	$\begin{array}{l}\displaystyle Loss_{cc} = -\frac{1}{N_n} \sum_{i=1}^{N_n} (\hat{y_i} \, log \, y_i + (1 - \hat{y_i}) \, log \, (1 - y_i)) \label{eq:Loss_cc}\end{array}$

(6)	$\begin{array}{l}\displaystyle Loss_{total} = \alpha \, Loss_{seg} + \beta \, Loss_{cc} + \lambda \, \lVert w \rVert^2_2 \label{eq:combinedLoss}\end{array}$

where N total number of voxels, p_i predicted probability of voxel i, g_i ground truth of voxel i, N_ntotal number of nodes, y_i predicted probability of node v_i, ŷ_i ground truth of node v_i,w trainable parameters of entire network, α, β, γ weight parameters.

To compute this loss, a graph connectivity constraint module (GCCM) is integrated into the CNN as a multi-task branch to supervise the learning in a way enforcing graph connectivity of the predicted segmentation. The GCCM constructs a 3D graph to model the vessel structure directly from the ground truth in training phase. To express connectivity in a sparse way from dense voxels, the 3D ground truth is divided into non-overlapping sub-regions of smaller size. Nodes are sampled as the average position of voxels belonging to vessels in these sub-regions or as the center voxel if the sub-region does not contain voxels belonging to vessels [6]. Adjusting the geodesic distance problem-specifically and re-using ideas from a vessel graph network (VGN) in [14], an edge is constructed between a pair of nodes whenever both are located in a vessel region and there exists a straight vascular access between them. The evaluation of the straight vascular access is transformed to the evaluation for the travel time between the nodes (as in Sethian and Popovici [13]), using greater speed values for voxels belonging to vessels and calculating the travel time with the fast marching method. The resulting adjacency matrix and a set of node features from the CNN serve then as inputs to a graph attention network. It applies an attention mechanism which consequently enforces the connectivity prior and results in less discontinuous segmentation results when applied to hepatic vessel data [6].

Figure 4 shows qualitative and quantative results from [6], where the visualizations showcast that the method correctly reconstructs the connectivity of a rather thick vessel while other methods either lose part of the vessel or incorrectly predict non-existing branches. Given the quantative evaluation, the proposed approach (using sampling intervals of size 6 and 8) achieves the highest 95% Hausdorff distance and maintains high Dice score on both datasets while keeping GPU memory consumption and inference time low [6].

Figure 4: Qualitative and quantative results for the graph connectivity constrained network; left: visualization of vessels segmented by different methods on 3D-ircadb-01 dataset [6, Fig.4], right: metrics comparison of different methods on datasets 3D-ircadb-01 and MSD08 [6, Table 3]

Segmentation using Cross Transformer Network

As another approach to prevent disconnected segmentation results, Pan et al. [11] enhance the U-Net architecture by constructing a transformer model in parallel to learn long-distance dependencies between anatomical regions. With its attention mechanism, transformers are inherently able to add global awareness to the segmentation process. A 3D swim transformer acts as a feature extractor and its feature map is then fused with those from the U-Net. To learn longterm dependencies, 3D shifted windows are fed into a multi-head self-attention module of the transformer followed by a feed-forward network. Thus, the cross transformer network is designed to better take into account global topological information and segment vessel structures, in their case coronary vessels, in a continuous manner (see Figure 5) [11]. On the two datasets used, the cross transformer network outperforms the other models on evaluation metrics Dice (overall + aorta and coronary separately), average symmetric surface distance (ASSD) and skeleton recall (SP) and skeleton precision (SP) (see Figure 5) [11].

Comparing the graph connectivity constrained network [6] with the cross transformer approach [11], on the one hand the transformer does not require challenging graph construction as it incorporates global topology information implicitly through a self-attention mechanism. On the other hand, the prior knowledge incorporation to a GCCM may be more substantial and GCCM are generally less computationally demanding and get by with less memory than a transformer. To overcome these impediments in transformers, one can exploit transfer learning potential to meaningfully train networks when labeled medical datasets are limited [5].

Figure 5: Qualitative and quantitative results for the cross-transformer network; left: visualization of vessels segemented on ASACA dataset, red and green areas mean the aorta and the coronary vessels [11, Fig.3], right: performance comparison of vessel segmentation among different models using ASACA500 and ASACA100 datasets [11, Table 1]

Edge-reinforced Network

To segment crisp edges precisely and cope with the distribution imbalance between of voxels being edge and non-edge, Zhang et al. [18] include a reverse edge attention mechanism in their network. It has Reverse Edge Attention Modules (REAMs) as skip connections of the encoder-decoder architecture to compensate for lost information during the downsampling operations, in this case inaccurate edge information. REAMs have been used in Zhang et al. [20] as well and these modules are inspired by the reverse attention blocks in Chen et al. [1]. From an intermediate, upsampled prediction an reverse attention weight matrix is calculated. Element-wise multiplying this with a convolutional feature, the weighted convolutional feature is better able to capture edge information (see Figure 4). Applying it to the layers deep in the network, the high semantic confidence can be combined with low resolution by sequentially discover additional regions and finer details of the vessel structure [1].

Figure 4: Illustration of the reverse attention block [1, Fig. 4]

Apart from the REAM, an edge reinforced loss is calculated to constrain both the edge and non-edge voxels (see equation 7). It is based on the Dice similarity coefficient (see equation 8) but an additional edge loss (see equation 9) measures the dissimilarity between the prediction and the ground truth as a combination of binary cross entropy (see equation 10) and Dice loss [18].

(7)	$\begin{array}{l}L_{ER} = \left\{ \begin{array}{ll} L_{Dice}(p,l) & DSC < \lambda \\ L_{Dice}(p,l) + L_{edge}(p_{edge} , l_{edge} ) & DSC \geq \lambda \\ \end{array} \label{eq:ERloss}\end{array}$

(8)	$\begin{array}{l}\displaystyle L_{Dice} = 1 - \frac{2 \sum_{i=1}^{N} p_i \, l_i + \epsilon}{\sum_{i=1}^{N}p_i^2 + \sum_{i=1}^{N}l_i^2 + \epsilon} \label{eq:Loss_Dice}\end{array}$

(9)	$\begin{array}{l}\displaystyle L_{edge}(p_{edge}, l_{edge}) = \frac{\zeta}{\kappa^2} L_{Dice}(p_{edge}, l_{edge}) + \frac{1}{\tau^2} L_{bce}(p_{edge}, l_{edge}) + log(1 + \kappa \tau) \label{eq:Loss_Edge}\end{array}$

(10)	$\begin{array}{l}\displaystyle L_{bce}(p, l) = - \sum_{i=1}^{N}l_i \, log \, p_i + (1 - l_i) \, log (1 - p_i) \label{eq:Loss_BCE}\end{array}$

where p predicted probability map, l ground truth label, p_edge, l_edge soft edge maps, N total number of voxels, p_i : i-th voxel of predicted probabilty map p, l_i i-th voxel of ground truth l, ζ weighted balance parameter Dice and BCE loss, κ, τ trainable parameters to balance loss terms.

Chosen applications are data sets on cerebrovascular vessels and nerves, which have many small branches. Figure 6 visualizes the ability of the edge-reinforced network to segment cerebrovascular in a way that preserves the thin vessels and maintains vessel continuity, highlighted by the green arrows with close-ups shown in the green rectangles. Given sensitivity (Sen), specificity (Spe), Dice score (DSC) and average Hausdorff distance (AHD), the edge-reinforced network achieves better results than the comparison models, especially higher sensitivity and AHD meaning better ability to identify vessel voxels [18].

Figure 6: Qualitative and quantitative results for edge-reinforced network; left: visualization of segmentation results on cerebrovascular dataset by different methods [18, Fig.5], right: segmentation results obtained by different methods on MIDAS-I and II datasets [18, Table 1]

Mean-Teacher Assistant Confident Learning

Lacking large data sets with high annotation quality in the medical domain, impedes sufficiently training models for segmentation or other computer vision tasks. Comparing different data sets shows differences in the annotation quality: Given the two data sets 3DIRCADb and M3D8 for hepatic vessels in Figure 7, one can easily spot mislabeled and unlabeled pixels in the latter [19]. The same data sets are used in the graph connectivity constrained network by Li et al. [6] as well but many samples of the second are sorted out due to insufficient annotation quality and slice thickness. Still, these samples hold valuable information and their usage for training enlarges the training data set and makes it more realistic as the network will probably be confronted with noisy data frequently. Nevertheless, ”directly introducing additional data with low-quality annotations may confuse the network, leading to undesirable performance degradation” [19].

Figure 7: 2D and 3D visualization of samples from (a) data set with high-quality annotations, and (b) data set with numerous mislabeled and unlabeled pixels. Red represents the labeled vessels, while the yellow arrows at (b) point at some unlabeled pixels [19, Fig. 1].

To overcome this issue and reasonably make use of high-quality (HQ) and low-quality (LQ) labeled data, Xu et al. [19] come up with an approach which combines U-Net architecture with mean-teacher models and confident learning. Their loss is a combination of three parts:

(11)	$\begin{array}{l}\displaystyle L = L_s + \lambda_c \, L_c + \lambda_{cl} \, L_{cl} \label{eq:lossMeanTeacher}\end{array}$

where L_s supervised loss on HQ labeled dataset, L_cperturbation consistency loss on both datasets, L_cl confident learning loss on LQ labeled dataset, λ_c , λ_cl trade-off weights for L_c and L_cl.

First is a supervised loss L_s for segmentation based on a combination of cross-entropy loss, Dice loss, focal loss [8] and boundary loss [4] to better deal with class imbalance and small vessel segmentation. The supervised loss is calculated using the backbone U-Net structure but only on the set of HQ labeled data as one can be quite certain about correct labels there. The second part is a perturbation consistency loss L_c on both the HQ and LQ labeled data set. To strengthen consistency under the influence of noise, the network incorporates a mean-teacher assistant as proposed by Tarvainen et al. [17]. The mean-teacher model assumes dual roles of a teacher and a student. While the student models learn as before on both data sets, the teacher regularizes its learning procedure with an exponential moving average over the weights of consecutive student models [17]. This makes the model less susceptible to noisy data. The consistency loss is calculated by the voxel-wise mean squared error [19]. The mean-teacher model can afterwards be used to calculate the self-denoised confident learning loss L_cl, only on the LQ set. Confident learning is ”a data-centric approach which focuses [...] on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence” [10]. After estimating voxel-wise label errors with the joint probability between the (noisy) observed labels and (true) latent labels, the method smoothly refines noisy labels [10][19]. Smooth refinement reduces the influence on the loss of miss-classification where the certainty about the correctness of the label is low anyway. In summary, the combined optimization includes information from both datasets while being cautious about the labels in the LQ set.

Given the comparison in the paper to two other models, Huang et al. [24] from 2018 and the classic U-Net architecture, the mean-teacher assistant confident learning (MTCL) achieves higher Dice score, precision (PRE), average surface distance (ASD) and Hausdorff distance (HD) (see figure 8) [19].

Figure 8: Quantitative results of mean-teacher assistant confident learing (MTCL) compared to two other approaches (Huang et al., U-Net) [19, Table 1]

Overview and Comparison

All four mentioned DL approaches (see table 1) use the U-Net encoder-decoder structure as a backbone to perform segmentation on medical images. To tackle domain-specific challenges such as complexity and connectivity of vessel structures, accurate edge detection and lack of large datasets with high-quality labels, the architecture is extended with additional modules, such as graph connectivity constrained modules [6], transformers [11], reverse edge attention modules [18] or the mean-teacher model [19]. To adjust the learning accordingly, the loss functions become a combination of classic supervised loss, usually based on Dice score, with novel terms, such as connectivity constraints loss [6] or consistency loss plus confident learning loss [19]. Both Li et al. [6] and Xu et al. [19] train on CT images of hepatic vessels, Pan et al. [11] generate two new datasets on coronary CT scans and Xia et al. [18] include MRA scans of cerebrovascluar vessels as well as microscopy of nerves. Consequently, application areas for 3D vessel segmentation are wide as they range from diseases of the liver to the heart to the brain.

Table 1: Overview and comparison of the four analyzed DL approaches regarding network architecture, datasets used and loss functions

Additional Approaches

Zhao et al. [21] follow a similar approach as the graph connectivity constrained network [6] with a hybrid deep neural network that consists of two cascaded subnetworks for segmentation, where the second has two coupled components, namely a traditional CNN-based U-Net and a graph U-Net. The graph is constructed using a preliminary segmentation mask and a probability mask for the occurrence of vessels in an image region, which both are predicted by the first U-Net. The architecture is evaluated on two coronary vessel data sets and one head and neck artery data set.

Li et al. [7] tackle limited availability of 3D segmentation maps and the ensuring of robustness to noise with a segmentation network using edge profiles as structural priors. Their network architecture includes a main vessel segmentation branch as well as an auxiliary edge prediction branch generating vessel edge profiles. 3D context is mined by employing LSTM modules and novel regularization terms suppress noise and encourage local homogeneity.

Personal Review of Papers

The graph connectivity constrained network (GCCN) by Li et al. [6] tackles a widely mentioned challenge of predicting continuous vessels during segmentation. Their paper qualitatively shows the improved ability to achieve connectivity in the segmentation results and preserve the vessel structure in a graph-like manner. Both datasets of hepatic vessels used in the paper are publicly available. Furthermore, they perform an ablation study for the newly introduced GCCN and detailed experimental comparisons to six other models to show performance improvement with regard to Dice score and Hausdorff distance as well as memory consumption and inference time. This leads us to shortcomings of the paper as the evaluation metrics are not exhaustive to proof quantitatively better performance on small and complex branch structures, where skeleta-based functions such as clDice are neccessary. Additionally, their code is not publicly available and many samples from one of the datasets were sorted out due to insufficient slice thickness.

Pan et al. [11] use a promising transformer approach for the task of 3D vessel segmentation. They perform an ablation study and a quantitative comparison to six other model in which the cross-transformer approach outperforms the others on coronary vessel segmentation task. Fortunately, skeleton precision and recall are reported as well. Besides the novel architecture, the two datasets used are part of their work and created by labeling 500 and 100 samples on coronary CT scans. Unfortunately, the two in-house datasets are not publicly available and the evaluation was only performed on these two which limits verifiability and comparability to previous work. Moreover, coronary vessels are usually easier to segment as their structure is less complex and varies less from patient to patient.

The edge-reinforced network by Xia et al. [18] introduces both a novel neural network architecture and a novel loss function addressing the challenge of finding crisp edges. They apply it on publicly available datasets on both cerebrovascular vessels and nerves making the evaluation more elaborate than in other papers. The code base is publicly available as well. Additionally, they perform an ablation study for both the architecture and the loss function and an experimental comparison to in total eight other models. Although they can qualitatively show, using prediction results and attention maps, that the model is stronger able to find crisp edges and segment small vessels, the metrics used (only sensitivity, specificity, Dice and average Hausdorff) do not proof better performances on thin vessels. Furthermore, the models it is compared with are not state-of-the-art at time of publication anymore and the comparison lacks statistical proof for the significance of the results.

The mean-teacher assistant with confident learning approach by [19] addresses a widely mentioned problem of only having few high-quality labeled data and combines two approaches in an interesting way. Instead of either using the noisy labels thoughtlessly or sorting them out, their approach extracts useful information from them and with it, enlarges the training dataset. Both datasets are publicly available and the code is so too. They perform an ablation study for both the introduction of a mean-teacher and a confident learning loss. Because only the high-quality labeled dataset can be used for evaluation, the validation dataset is small and only consists of ten CT scans. Particularly noteworthy is that all experiments are performed only in 2D because 3D models performed far worse on the given data. But only segmenting in 2D completely changes and simplifies the task and medical scans are originally 3D. Lastly, the paper lacks a comparison to state-of-the-art models as it only compares to the classical U-Net and another approach from 2018.

`References`

[1] Shuhan Chen, Xiuli Tan, Ben Wang, Huchuan Lu, Xuelong Hu, and Yun Fu. Reverse attention-based residual network for salient object detection. IEEE Transactions on Image Processing, 29:3763–3776, 2020.

[2] Özgün Cicek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3d u-net: Learning dense volumetric segmentation from sparse annotation. In Sebastien Ourselin, Leo Joskowicz, Mert R. Sabuncu, Gozde Unal, and William Wells, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, pages 424–432, Cham, 2016. Springer International Publishing.

[3] Shruti Jadon. A survey of loss functions for semantic segmentation. In 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pages 1–7, 2020.

[4] Hoel Kervadec, Jihene Bouchtiba, Christian Desrosiers, Eric Granger, Jose Dolz, and Ismail Ben Ayed. Boundary loss for highly unbalanced segmentation. Medical Image Analysis, 67:101851, 2021.

[5] Salman Khan, Muzammal Naseer, Munawar Hayat, SyedWaqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. Transformers in vision: A survey. ACM Comput. Surv., 54(10s), sep 2022.

[6] Ruikun Li, Yi-Jie Huang, Huai Chen, Xiaoqing Liu, Yizhou Yu, Dahong Qian, and Lisheng Wang. 3d graph-connectivity constrained network for hepatic vessel segmentation. IEEE Journal of Biomedical and Health Informatics, 26(3):1251–1262, 2022.

[7] Xuelu Li, Raja Bala, and Vishal Monga. Robust deep 3d blood vessel segmentation using structural priors. IEEE Transactions on Image Processing, 31:1271–1284, 2022.

[8] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2):318–327, 2020.

[9] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 565– 571, 2016.

[10] Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels. J. Artif. Int. Res., 70:1373–1411, May 2021.

[11] C. Pan, B. Qi, G. Zhao, J. Liu, C. Fang, D. Zhang, and J. Li. Deep 3d vessel segmentation based on cross transformer network. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1115–1120, Los Alamitos, CA, USA, dec 2022. IEEE Computer Society.

[12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing.

[13] James Sethian and Alexander Popovici. Three dimensional traveltimes computation using the fast marching method. Geophysics, 64:516–523, 03 1999.

[14] Seung Yeon Shin, Soochahn Lee, Il Dong Yun, and Kyoung Mu Lee. Deep vessel segmentation by learning graphical connectivity. Medical Image Analysis, 58:101556, 2019.

[15] S. Shit, J. C. Paetzold, A. Sekuboyina, I. Ezhov, A. Unger, A. Zhylka, J. W. Pluim, U. Bauer, and B. H. Menze. cldice - a novel topology-preserving loss function for tubular structure segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16555–16564, Los Alamitos, CA, USA, jun 2021. IEEE Computer Society.

[16] Carole H. Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M. Jorge Cardoso. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In M. Jorge Cardoso, Tal Arbel, Gustavo Carneiro, Tanveer Syeda-Mahmood, Joao Manuel R.S. Tavares, Mehdi Moradi, Andrew Bradley, Hayit Greenspan, Joao Paulo Papa, Anant Madabhushi, Jacinto C. Nascimento, Jaime S. Cardoso, Vasileios Belagiannis, and Zhi Lu, editors, Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 240–248, Cham, 2017. Springer International Publishing.

[17] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 1195–1204, Red Hook, NY, USA, 2017. Curran Associates Inc.

[18] Likun Xia, Hao Zhang, Yufei Wu, Ran Song, Yuhui Ma, Lei Mou, Jiang Liu, Yixuan Xie, Ming Ma, and Yitian Zhao. 3d vessel-like structure segmentation in medical images by an edge-reinforced network. Medical Image Analysis, 82:102581, 2022.

[19] Zhe Xu, Donghuan Lu, Yixin Wang, Jie Luo, Jayender Jagadeesan, Kai Ma, Yefeng Zheng, and Xiu Li. Noisy Labels are Treasure: Mean-Teacher- Assisted Confident Learning for Hepatic Vessel Segmentation, pages 3–13. Springer, 2021.

[20] Hao Zhang, Likun Xia, Ran Song, Jianlong Yang, Huaying Hao, Jiang Liu, and Yitian Zhao. Cerebrovascular segmentation in mra via reverse edge attention network. In Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pages 66–75, Cham, 2020. Springer International Publishing.

[21] Gangming Zhao, Kongming Liang, Chengwei Pan, Fandong Zhang, Xianpeng Wu, Xinyang Hu, and Yizhou Yu. Graph convolution based crossnetwork multi-scale feature fusion for deep vessel segmentation. IEEE Transactions on Medical Imaging, PP:1–1, 09 2022.

[22] Miguel Angel Gonzalez Ballester, Andrew P. Zisserman, and Michael Brady. Estimation of the partial volume effect in mri. Medical Image Analysis, 6(4):389–405, 2002.

[23] Hyun Woo Goo, Dong-Man Seo, Tae-Jin Yun, Jeong-Jun Park, In-Sook Park, Jae Kon Ko, & Young Hwee Kim. Coronary artery anomalies and clinically important anatomy in patients with congenital heart disease: multislice CT findings. Pediatr Radiol 39, 265–273, 2009.

[24] Qing Huang, Jinfeng Sun, Hui Ding, Xiaodong Wang, & Guangzhi Wang. Robust liver vessel extraction using 3D U-Net with variant dice loss function. Computers in biology and medicine, 101, 153–162, 2018.

Seitenhierarchie

3D vessel segmentation