Automatic Multi-Organ Segmentation on Abdominal CT with Dense V-Networks

Written by Eli Gibson, Francesco Giganti, Yipeng Hu, Ester Bonmati, Steve Bandula‖, Kurinchi Gurusamy, Brian Davidson,
Stephen P. Pereira, Matthew J. Clarkson, and Dean C. Barratt

Introduction

Organ segmentation is used to understand the location of the organs, specifying the boundaries of them. It can be used for targeting, navigation in Gastrointestinal systems and vice versa. Organ segmentation is used to diagnosis [1], treatment planning [2], treatment delivery[3] and Computer Aided Diagnosis with biomarker measurement systems [1].

There are two types of segmentation methods; the ones with the registration [4] and the without registration [11]-[14]. The state of the art methods for organ segmentation with registration is Statistical Models and Multi-Atlas-Label-Fusion (MALF) methods. Statistical Models use co-registreted images and create statistical modeling for organ segmentation [5]. MALF uses propagated reference of registrated segmentations in training data [6]-[10]. However, both of the registrated methods can have some problems considering to have a small endoscopic field of view and lack of visual orientation cues. Besides registration brings about problems like the variance of soft tissue deformations and having a limitation by registration accuracy [15]-[17]. All these problems lead to having another segmentation method without using registration, which is fully convolutional neural networks [18]-[31].

Fully Convolutional Neural Networks uses 3D images and it causes memory usage problems. In order to solve this problem, two techniques have been tried. Firstly, small parts of the big images were used, however, it brings about losing spatial context [21],[24]-[27]. Another method is limiting the network depth which results in constraining the receptive field [21],[28]. All in all, the necessity is having a segmentation method that is registration free and has memory efficiency as well as high resolution.

Previous networks have segmented 19 abdominal organs in 2D slices with majority-voting label fusion [18], segmented 7 organs with two-stage hierarchical pipeline based on 3D Unet [31], segmented 4 organs with 3D FCN to generate probability maps to level set based segmentation [19] and segmented organs with a combination of MALF with 3D FCN and hand-tuned input features [20]. These methods were tested on the left kidney, gallbladder, liver, right adrenal gland and aorta. This article focuses on the segmentation of the Gastrointestinal (GI) tract (esophagus, stomach, duodenum), spleen, pancreas, left kidney, gallbladder, liver by using Dense V-net and compared with 2 existing FCN. It concluded having higher resolution activation maps through memory-efficient dropout and feature reuse, used batch-wise spatial dropout lowers the memory and computational cost (from dropout), used cross-validation over 90 manually segmented images.

Methodology

Figure1. : DenseVnetwork architecture

Dense V-network comprises 5 key features [37]:

batch-wise spatial dropout,
dense feature stacks
V-network downsampling and upsampling
dilated convolutions
explicit spatial prior

1. Batch-wise spatial Dropout

For the purpose of solving computational and memory efficiency problem, batch-wise spatial dropout was presented. Regular dropout focuses on dropping random channels in order to regularize the network, therefore it calculates and stores the dropped activations [40]. The proposed approach modifies convolutional kernels with respect to not to calculate the activations. In spatial dropout, the probability distribution of keeping the node out of channels is a binomial distribution, however, in batch-wise spatial dropout channels are dropped by using Bernoulli distribution and it limits the maximum memory usage by scaling the convolutional output.

2. Dense Feature Stacks

As a consequence of using dense Feature Stacks, the input of each function becomes the concatenated output of all preceding functions[27]. A combination of dense feature stacks with the spatial batch-wise dropout enables inherently encoding of the feature stacks identity functions like residual networks [42]; besides it allows the combination of multiple network depths within a single network which results in effective propagation of gradients and representation of a complex function. When the number of activation maps is limited by the memory constraints, information from earlier layers is stored once and can be accessed by later layer; in other words, strong multiple copies of feature maps are not created which brings about O(m) memory usage [44].

3. V-network downsampling and upsampling

V-network downsampling and upsampling are used in order to solve the resolution problem by comprising downsampling and upsampling subnetworks with skip connections to propagate higher resolution at the final segmentation [45]. Normal V-networks typically use shallow stridden convolution downsampling units followed by shallow transpose-convolutional upsampling units with additive or concatenating skip connections within each resolution, however; DenseVnet is different in several ways; downsampling subnetwork is a sequence of three dense feature stacks connected by downsampling stridden convolutions, memory efficiencies of dense feature stacks and batch-wise spatial dropout enable deeper networks at higher resolutions and the bilinear upsampling of skip connections to the segmentation resolution limits artifacts induced by transpose convolution [45].

4. Dilated Convolutions

Dilated convolutions are done by using sparse convolution kernels to represent the functions with the large receptive field but few trainable parameters. Due to both resolution and receptive field size depend on the stride, sequential downsampling can generate local high-resolution features in early layers and global low resolution features after the downsampling layers. Besides, if exponentially increasing dilation rate with stride one convolution is applied, exponentially growing receptive fields can be seen in the early layers where the resolution is high. In order to the detection of small structures such as the esophagus, the feature of having the feature of high resolution large receptive field at the early layers is helpful.

5. Explicit Spatial Prior

Spatial segmentation priors are the result of having standard anatomically aligned views with relatively consistent organ positions and orientations of medical images. In their previous work, they explained Explicit Spatial Prior in detail [21]. Boundary effects of convolution or by providing image coordinates as an input channel are the conditions of encoding of spatial priors [46]. The spatial prior is a low-resolution 3D map of trainable parameters, in addition, this map is bilinearly upsampled to the segmentation resolution and added to the outputs of the V-network.

Experiment

Data

90 abdominal CT images and corresponding reference standard segmentations: spleen, left kidney, gallbladder, esophagus, liver, stomach, pancreas and duodenum

43 subjects from Cancer Imaging Archive Pancreas-CT data set [26], [32], [33] with pancreas segmentations

47 subjects from the ‘Beyond the Cranial Vault’ (BTCV) segmentations of all organs except duodenum

Voxel sizes from:

0.6–0.9 mm in the AP and LR axes
0.5–5.0 mm in the inferior-superior (IS) axis

manually cropped, therefore, fields of view ranging from:

172–318mm AP, 246–367mm LR
138–283mm IS

Implementation Details

The loss function: the weighted sum of L2 regularization loss
ReLU
Trained using the Adam optimizer with ϵ = 0.001
Mini-batch size 10 for 5000 iterations
Training took approximately 6 hours using Titan X Pascal or P100 GPUs (NVIDIA Corporation, Los Alamitos, CA)
Tensorflow

Evaluation Metrics and Statistical Methods

A comparison of the accuracy of segmentation algorithms was done by using 9-fold cross-validation over 90 subjects. For each test image in each fold Ground Truth comparison is done by 3 metrics:

Dice coefficient
symmetric mean boundary distance
symmetric 95% Hausdorff distance

Dice coefficient measures the relative volumetric overlap between segmentations; symmetric mean boundary distance and symmetric 95% Hausdorff distance present the segmentation boundary agreement with high sensitivity at local disagreement.

Two different experiments are done:

1. Comparison of Dense V-network to 2 different algorithms:

- the multi-atlas-label-fusion-based DEEDS+JLF [34], [35]

- the deep-learning-based VoxResNet [36]

Deep-learning-based Vnet:

Original VNet is designed for binary segmentation, for denseVNet the loss gradient is modified in order to be applicable for multiple labels. In addition, Vnet uses parametric ReLU, however, it does not use batch-normalization. Downsampling subnetwork comprises a residual unit also upsampling subnetwork concatenates.

Comparison algorithm 1: deep-learning-based VoxRes-Net [36]:

This algorithm is evaluated for the segmentation of multiple tissue types in the brain, for this work, it was adapted to multi-organ segmentation by adding output channels. It uses batch normalization and ReLU (looks like denseVnet) and combines all upsampled features together including training with the same loss function and optimization protocol.

Comparison algorithm 2: MALF-based DEEDS+JLF [34], [35]:

DEEDS which is shown to yield the highest registration accuracies in a direct comparison of 6 publicly available algorithms [15] has Joint label fusion, it computes the weighted average of the transformed labels

2. Comparison of variations of the proposed DenseVNet architecture

To evaluate the hinge loss, dilated convolutions, explicit spatial prior, batch-wise spatial dropout, the network is compared to four corresponding alternative networks:

- Hinge loss: a network replacing the hinge loss with the simpler Dice score, called as NoHinge

- Dilated convolutions: a network with each dilated convolution exchanged with a standard 3 × 3 × 3 convolution, called as NoDil

- The explicit spatial prior: a network with the spatial prior layer omitted, called as NoPrior

- Batch-wise spatial dropout: a network using standard Bernoulli distributed channel-wise spatial dropout [40] within the dense feature stacks with probability p = 0.5, called as NoBSDO

Results

Figure2: Representative segmentations from the 25th, 50th and 75th percentiles of Dice scores

From the segmentations in figure 2 on abdominal CT, the segmented boundaries by using different networks are seen. If the evaluation metrics of the segmentation for each organ table are observed in Table 1, it is seen that denseVnet has better results compared to the other existed networks.

Table1: Whisker plots of the segmentation evaluation metrics for each organ

Primary Analysis: Algorithm Comparison Results

The table of the medians of the segmentation evaluation metrics in Table 2 for each organ states that DenseVnet has improved results for each organ except Duodenum. At the mean boundary distance part, DEEDS+JLF has a closer distance than the DenseVnet; however, because of features of the duodenum, it is hard to detect properly. In addition, duodenum segmentation comparison by Hausdorff distance again presents that VoxResNet has less distance than DenseVnet.

Table2: The medians of the segmentation evaluation metrics for each organ.

Secondary Analysis: Architecture Comparison

If table 3 is checked, the secondary analysis shows that Dilated convolutions and the explicit spatial prior were not statistically significant Batch-wise spatial dropout or one downsampling unit of the V-structure yielded a small loss in accuracy by all metrics for most organs. In addition, Eliminating the dense connections or eliminating the V-network entirely yielded substantially less accurate segmentation for all comparisons.

Table3: The medians of the segmentation evaluation metrics for each organ

Conclusion

In the end, it is conducted that clinically acceptable segmentation accuracies have yet to be defined for guiding abdominal interventions. DenseVnet can improve the accuracy at GI tract and pancreas segmentation including having the largest improvements for the smallest organs such as gallbladder and esophagus. The architecture experiments state that the dense connections and the multi-scale V-network structure significantly improved the performance, besides segmentation accuracy is also improved with the help of having densely linked layers and a shallow V-network architecture. In addition, Batch-wise spatial Dropout supplies memory efficiency and V-network downsampling and upsampling correspond to have a high resolution; on the other hand, the use of dilated convolutions and spatial prior are not necessary. For future Work; accuracy with the 3D patient-specific anatomical model to aid endoscopic navigation can be checked

References

[1]. van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011; 261(3):719–732. [PubMed: 22095995]
[2]. Sykes J. Reflections on the current status of commercial automated segmentation systems in clinical practice. Journal of Medical Radiation Sciences. 2014; 61(3):131–134. [PubMed: 26229648]
[3]. Howe RD, Matsuoka Y. Robotics for surgery. Annual Review of Biomedical Engineering. 1999; 1(1):211–240.
[4]. Eisen GM, Dominitz JA, Faigel DO, Goldstein JA, Petersen BT, Raddawi HM, Ryan ME, Vargo JJ, Young HS, Wheeler-Harbaugh J, et al. Guidelines for credentialing and granting privileges for endoscopic ultrasound. Gastrointestinal endoscopy. 2001; 54(6):811–814. [PubMed: 11726873]

[5]. Cerrolaza JJ, Reyes M, Summers RM, González-Ballester MÁ, Linguraru MG. Automatic multi-resolution shape modeling of multi-organ structures. MedIA. 2015; 25(1):11–21.
[6]. Okada T, Linguraru MG, Hori M, Summers RM, Tomiyama N, Sato Y. Abdominal multi-organ segmentation from CT images using conditional shape–location and unsupervised intensity priors. MedIA. 2015; 26(1):1–18.
[7]. Xu Z, Burke RP, Lee CP, Baucom RB, Poulose BK, Abramson RG, Landman BA. Efficient multi-atlas abdominal segmentation on clinically acquired CT with SIMPLE context learning. MedIA. 2015; 24(1):18–27.
[8]. Tong T, Wolz R, Wang Z, Gao Q, Misawa K, Fujiwara M, Mori K, Hajnal JV, Rueckert D. Discriminative dictionary learning for abdominal multi-organ segmentation. MedIA. 2015; 23(1):92–104.
[9]. Suzuki M, Linguraru MG, Okada K. MICCAI. Springer; 2012. Multi-organ segmentation with missing organs in abdominal CT images; 418–425.
[10]. Shimizu A, Ohno R, Ikegami T, Kobatake H, Nawano S, Smutek D. Segmentation of multiple organs in non-contrast 3D abdominal CT images. IJCARS. 2007; 2(3):135–142.
[11]. Casiraghi E, Campadelli P, Pratissoli S, Lombardi G. Automatic abdominal organ segmentation from CT images. ELCVIA. 2009; 8
[12]. Saxena S, Sharma N, Sharma S, Singh S, Verma A. An automated system for atlas based multiple organ segmentation of abdominal CT images. BJMCS. 2016; 12:1–14.
[13]. Lombaert H, Zikic D, Criminisi A, Ayache N. MICCAI. Springer; 2014. Laplacian forests: semantic image segmentation by guided bagging; 496–504.
[14]. He B, Huang C, Jia F. Fully automatic multi-organ segmentation based on multi-boost learning and statistical shape model search. VISCERAL Challenge@ ISBI. 2015:18–21.
[15]. Xu Z, Lee CP, Heinrich MP, Modat M, Rueckert D, Ourselin S, Abramson RG, Landman BA. Evaluation of six registration methods for the human abdomen on clinically acquired CT. IEEE Trans Biomed Eng. 2016; 63(8):1563–1572. [PubMed: 27254856]
[16]. Landman B, Xu Z, Igelsias JE, Styner M, Langerak TR, Klein A. MICCAI multi-atlas labeling beyond the cranial vault - workshop and challenge. 2015 accessed July 2017.
[17]. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Snchez CI. A survey on deep learning in medical image analysis. arXiv:1702.05747v1. 2017
[18]. Zhou X, Ito T, Takayama R, Wang S, Hara T, Fujita H. MICCAI Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer; 2016. Three-dimensional CT image segmentation by combining 2D fully convolutional network with 3D majority voting; 111–120.
[19]. Hu P, Wu F, Peng J, Bao Y, Chen F, Kong D. Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets. IJCARS. 2016:1–13.
[20]. Larsson M, Zhang Y, Kahl F. Robust abdominal organ segmentation using regional convolutional neural networks. Scandinavian Conference on Image Analysis; Springer; 2017. 41–52.
[21]. Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusami K, Davidson BR, Pereira S, Clarkson MJ, Barratt DC. Towards image-guided pancreas and biliary endoscopy: automatic multi-organ segmentation on abdominal CT with dense dilated networks. MICCAI. 2017 Sep. accepted. (their previous work)
[22]. Heimann T, Meinzer H-P. Statistical shape models for 3D medical image segmentation: a review. Medical image analysis. 2009; 13(4):543–563. [PubMed: 19525140]
[23]. Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE Trans Pattern Anal Mach Intell. 2001; 23(6):681–685.
[24]. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. MICCAI. Springer; 2016. 3D U-net: learning dense volumetric segmentation from sparse annotation; 424–432.
[25]. Milletari F, Navab N, Ahmadi S-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. IEEE 3D Vis. 2016:565–571.
[26]. Roth HR, Lu L, Farag A, Shin H-C, Liu J, Turkbey EB, Summers RM. MICCAI. Springer; 2015. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation; 556–564.

[27]. Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely connected convolutional networks. arXiv:1608.06993. 2016
[28]. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122. 2015
[29]. Oktay O, Ferrante E, Kamnitsas K, Heinrich M, Bai W, Caballero J, Guerrero R, Cook S, de Marvao A, Dawes T, O’Regan D, et al. Anatomically constrained neural networks (acnn): Application to cardiac image enhancement and segmentation. 2017
[30]. BenTaieb A, Hamarneh G. Topology aware fully convolutional networks for histology gland segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer; 2016. 460–468.
[31]. Roth HR, Oda H, Hayashi Y, Oda M, Shimizu N, Fujiwara M, Misawa K, Mori K. Hierarchical 3d fully convolutional networks for multi-organ segmentation. 2017
[32]. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, et al. The cancer imaging archive (TCIA): Maintaining and operating a public information repository. Journal of Digital Imaging. 2013; 26(6):1045. [PubMed: 23884657]
[33]. Roth HR, Farag A, Turkbey EB, Lu L, Liu J, Summers RM. Data from TCIA Pancreas-CT. 2016
[34]. Heinrich MP, Jenkinson M, Brady M, Schnabel JA. MRF-based deformable registration and ventilation estimation of lung CT. IEEE Trans Med Imag. 2013; 32(7):1239–1248.
[35]. Wang H, Suh JW, Das SR, Pluta JB, Craige C, Yushkevich PA. Multi-atlas segmentation with joint label fusion. IEEE TPAMI. 2013; 35(3):611–623.
[36]. Chen H, Dou Q, Yu L, Qin J, Heng P-A. VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images. NeuroImage. 2017
[37]. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE CVPR. 2015:3431–3440.
[38]. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. 2015
[39]. Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machinesProc ICML. Fürnkranz J, Joachims T, editorsOmnipress; 2010. 807–814.
[40]. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C. Efficient object localization using convolutional networks. IEEE CVPR. 2015:648–656.
[41]. Gal Y, Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. CoRR. 2015 vol. abs/1506.02142.
[42]. He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. arXiv:1603.05027. 2016
[43]. Veit A, Wilber MJ, Belongie S. Residual networks behave like ensembles of relatively shallow networks. NIPS. 2016:550–558.
[44]. Pleiss G, Chen D, Huang G, Li T, van der Maaten L, Weinberger KQ. Memory-efficient implementation of densenets. arXiv preprint arXiv:1707.06990. 2017
[45]. Odena A, Dumoulin V, Olah C. Deconvolution and checkerboard artifacts. Distill. 2016 [Online]. Available: http://distill.pub/2016/deconv-checkerboard.
[46]. Brust C-A, Sickert S, Simon M, Rodner E, Denzler J. Efficient convolutional patch networks for scene understanding. CVPR Scene Understanding Workshop. 2015
[47]. Pereyra G, Tucker G, Chorowski J, Kaiser Ł, Hinton G. Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548. 2017
[48]. Kroon D-J. Smooth triangulated mesh. MATLAB Central File Exchange. 2010 [accessed 07/04/2017] [Online]. Available: mathworks.com/matlabcentral/fileexchange/26710.
[49]. Gerard PD, Schucany WR. An enhanced sign test for dependent binary data with small numbers of clusters. Computational statistics & data analysis. 2007; 51(9):4622–4632.
[50]. Landman B, Warfield S, editorsMICCAI 2012 Grand Challenge and Workshop on Multi-Atlas Labeling; 2012.

Seitenhierarchie