Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow

This is the blogpost for the paper Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow

Introduction

Methodology
Experiments and Results
Conclusion and Discussion
Reference

Introduction

Cine MRI

Cine magnet resonance imaging is a very powerful tool to access cardiac pathology. There is a great need for automated cardiac pathology identification and classification in a clinical trial. The paper proposed a method to tackle the cardiac pathology classification on 4D cine MRI images. Cine MRI is to use magnetic resonance imaging to image heart as a 3D volume at a different time point inside one cardiac cycle. The image showed below is cine MRI images from multiple frames covers one cardiac cycle. The first image on the upper left is End-diastole(ED) where the heart muscle relaxes and the blood flows in the heart. The image on the third row and the first column is End-systole(ES) where the heart muscle stressed and pumped blood out of the heart into the body.

ACDC dataset

The dataset used is called ACDC dataset which contains 100 cases of the training set and 50 cases of the testing set. One case is one 4D cine MRI image. Each case has a different number of slices and frames. The ES and ED frame in each case are labelled with the segmentation mask which segments the heart into four categories which are left ventricle cavity(LVC), right ventricle cavity(RVC), left ventricle myocardium(LVM) and background. The training set and testing set are divided into five pathology groups. Also, the property of each group is listed in the dataset which could be a guideline for the feature extraction. The five pathology groups and each of its property[2] is listed below:

dilated cardiomyopathy (DCM): left ventricle cavity (LVC)
volume at ED larger than 100 mL/m2 and LVC ejection fraction
lower than 40%
hypertrophic cardiomyopathy (HCM): left ventricle (LV) cardiac
mass higher than 110 g/m2, several myocardial segments
with a thickness higher than 15 mm at ED and a normal ejection
fraction
myocardial infarction (MINF): LVC ejection fraction lower
than 40% and several myocardial segments with abnormal contraction
RV abnormality (RVA): right ventricle cavity (RVC) volume
higher than 110 mL/m2 or RVC ejection fraction lower than
40%
normal subjects (NOR)

In the paper, they also mentioned that the DCM and MINF pathology both have low LVC ejection fraction and DCM could also lead to MINF. So it's quite a challenge to distinguish them. For that, they proposed the motion characteristic features to distinguish them.

Related work

In the paper, they also mentioned some related works, likes Khened et al. (2017)[6], Khened et al. (2018)[7], Wolterink et al. (2017)[4], Isensee et al. (2017)[3] and Cetin et al. (2017)[5]. They all have a similar pipeline which is first do segmentation in the left ventricle cavity, right ventricle cavity and myocardium. Second, some features are calculated using the segmentation mask. Finally, the classification based on the selected features is performed. The methods proposed by Khened et al. (2017)[6], Khened et al. (2018)[7], Wolterink et al. (2017)[4], and Cetin et al. (2017)[5] only uses features derived from ED and ES frame. Isensee et al. (2017)[3] is the only paper in ACDC challenge which derived the features from frames other than ES and ED. However, all the methods are using a huge network as a classifier and it makes their method not straight forward to interpret.

Methodology

The method contains two parts features extraction and classification. However, the region of interest(ROI) should be first segmented.

Region of interest(ROI)

The region of interest should be first determined in order to later efficiently compute the flow map and segmentation mask. The ROI-net is a variant of U-net which takes a 2D slice from short-axis view as input and outputs a binary segmentation mask as described in the image showed below. However, the 3D volume images should be segmented. So the concept is the segmentation begins from the top slice and then propagate along the long-axis until the bottom slice reached. According to the paper, the ROI performs well and very robust to the ACDC dataset.

Feature extraction

The feature extraction will be overviewed. As described in the figure below, the feature extraction follows four steps. As first, ApparentFlow-net is applied to generate a flow map between a slice from ED frame and the second slice from frame other than ED and two slices should be at the same slice location. The generated flow map will be used later to extract motion characteristic feature. The second step is to segment 3D heart volume at ES and ED frame into four categories( LVC, RVC, LVM, Background). As next, the segmentation masks from the second step were used to calculate the shape-related features, like the volume of the left ventricle, the right ventricle, ejection fraction and the volume ratio. As the last, the ApparentFlow-net is applied to generate the flow map between the slice at ED and the slice at a frame other than ED. With the generated flow map and segmentation mask, the motion-characteristic features will be derived namely radius motion disparity and thickness motion disparity.

Apparent flow generation

As shown in the figure below, the AppatentFlow-net is a variant of U-net. The input images are two slices. One is from the ED frame and other is from a frame other than the ED frame. The input images propagated through the network and output is a pixel-wise flow map which describes how each pixel moved from the first slice to the second slice.

The network is based on two assumptions. The first one is intensity consistency which assumes that the object intensity doesn't change during the motion. The second assumption is neighbourhood relation consistency which says the relationship between neighbourhood pixel doesn't change during the motion. So if a neighbourhood pixel is on the right-hand side of a pixel, after the move it should also be on the right-hand side. Based on the two assumptions, the loss could be derived. The first loss is shown below:

which calculate the intensity difference between two pixels. One is a pixel from the ED frame. Other is the pixel on the second frame but its location is warped by the flow vector.

The second loss is based on neighbourhood relation consistency and is shown below:

So the derivative term becomes a number smaller than -1 when two pixels' location is switched. So the minimum operation takes a smaller value between 0 and the term one plus the derivative of the flow map. When the pixels swap happening, the term one plus the derivative of flow map became a value smaller than zero and the minimum operation will take the small value and powered by two. So whenever the swap happening, it was penalised by the loss function. Also, the image below shows the situation where two pixels are swapped. So the first row shows swap on x-components and YES or NO shows if the pixels are swapped. The same for the second row on the y-component.

However, there are also label data available. So the third loss calculates the dice score between the ground truth mask on the ED frame and the segmentation mask generated by the flow map between ED and ES frame. The loss is shown below:

The dice loss is calculated between the ground truth mask on the ED frame and a generated ED frame mask by applying the flow map on the ES ground truth mask. The flow map was generated between the ES and ED frame. So basically, it uses the flow map to warp the ES ground truth mask to ED frame and compares it with ground truth mask at the ED frame.

The overall loss is combined the three losses together and is shown below:

There is an indicator function in front of the third loss and makes sure it will only be turned on when the ground truth data is available. This is a semi-supervised fashion which uses a small amount of data to supervise the training. The two assumptions only hold for the middle 60% slices where the in-plane motion is dominant. For the top and bottom slice, there are strong out of plane motion which violates the two assumptions. So the network is only applied to the middle 60% slice of each volume.

Segmentation at ED and ES

The segmentation is done by a so-called LVRV-net. LVRV-net is a variant of U-net. It's similar like ROI-net. So the concept is the segmentation begins from the top slice and then propagate along the long axis of the heart until the bottom slice reached. As input, two slices were given. One is the slice which needs to be segmented. Other is the segment mask from previews slice. When the segmentation mask of previews slice is not available, a zero image could input as the mask. This could give the network not only the information about 2D in-plane but also the contextual information along the long axis. This network is only trained and applied to the ED and ES frame where the ground truth is available. So the training is fully supervised.

Shape related feature extraction

With the segmentation mask on ED and ES which was generated from the second step, the seven shape-related features could be calculated. The seven features are listed in the table shown below.

So it can be seen from the table, the volume is a very important feature. Other features like ejection fraction and ratio between RVC and LVC could be derived from the volume feature. The way to calculate the volume is as follows. The volume between adjacent slice could be seen as a truncated cone. From the segmentation mask, the upper and down surface area could be calculated. Also, the distance between slice is known. So according to the truncated cone's volume calculation formula(see the formula list below), the volume could be calculated and then all the volume between adjacent slice could be summed up as the final volume.

Motion related feature extraction

In the introduction session, the pathology groups are listed. It's mentioned there that the DCM and MINF pathology both have low LVC ejection fraction and DCM could also lead to MINF. So they cannot be distinguished by ejection fraction alone. A more sophisticated feature is needed in order to distinguish them. From the visual observation, the myocardium's contraction is a key feature to distinguish them. The myocardium's contraction could be described by the radius of the left ventricle cavity and thickness of the myocardium see the image below. However, the contraction motion is not homogenous. It varies in different directions. In order to fully characterize the motion, the myocardium was divided into six segments and the contraction motion is described by the radius and thickness of each segment along the cardiac cycle. Each segment is equally distributed. Before the calculation, the middle 10 slices of each volume were selected. Also,10 frames in each case were sampled so that the first frame is the ED frame and the last frame is the last frame in each case. The frames in between were sampled with an equal time interval. The sampled 10 frames cover the whole cardiac cycle. In order to determine the radius and thickness of each segment in each sampled frame, three points are first needed to be determined namely B0( Barycenter of the left ventricle cavity), I_k( Inner boundary centre of each segment), O_k(Outer boundary centre of each segment). For the first sampled frame( ED frame), we could use the LVRV-net segment to get segment mask. From the segment mask, we know which pixels belong to the LVC. Then all pixels belong to LVC are taken and the B0 is the average of their pixel location. Similar to I_k, we take all pixels belong to LVC and have the neighbourhood pixel in myocardium segment K. The I_k is their averaged pixel location. For O_k, we take all pixels belong to LVC and have the neighbourhood pixel in background. The O_k is their averaged pixel location. However, the segmentation mask is only available for ED and ES frame. For the frame other than the ED and ES frame, we use ApparentFlow-net mentioned in the first step to generate a flow map between the ED and the frame whose segmentation mask is needed. Then we use flow map to warp the segment mask in ED to that frame, so we can get a segmentation mask also in that frame. So we can calculate three points for each segment in each sampled frame.

With this three-point, the radius and thickness of each segment could be calculated.

The radius of each segment:

The thickness of each segment:

The radius and thickness are then normalized by the body surface area so that it's in the same interval for patients with different height and weight.

After, the disparity of radius and thickness was calculated as the difference of maximal radius or thickness and minimal radius or thickness in one frame.

Radius motion disparity in a single frame

Thickness disparity in a single frame

The maximal disparity along with one cardiac cycle was found as final radius motion disparity(RMD) or thickness motion disparity(TMD)

Radius motion disparity

Thickness motion disparity

So now we have seven shape-related feature and two motion-related feature. As the next step, we could do classification based on the these features.

Classification

For classification, they use four binary classifiers in increasing order with the difficulties of their binary task. So the overview of classification diagram is shown below. The classification was done sequentially. It means that the first classifier classifies the most essay diagnostic disease and then the last classifier classifiers the most difficult diagnostic disease.

Each classifier takes up to three manual selected features. The table listed below shows the input features for each classifier.

Each classifier is rigid regression. So it can be understood as one layer network. It has first a linear term p_i is the weights and f_mi is the input feature. b is bias. The output of the linear term is the predicted value. If it is bigger than 0 then it means Yes, otherwise is NO. The formula shown below is the loss.

Experiments and Results

ApparentFlow-net

The ApparentFlow-Net was trained with 13672 pairs and 515 are with ground truth. The evaluation was done by calculating the dice score of LVC, RVC and LVM on ES and ED frame. The result are shown below:

Except for the semi-supervised approach, they also trained a model with only unlabeled data and it performs worse than semi-supervised fashion.

LVRV-net

LVRV-net is only trained on labelled data with 5 fold validation. The result is shown here:

3D Dice
LVM	LVC	RVC
0.94	0.90	0.89

Classification

The classifiers are trained with 100 cases and tested in 50 cases. The performers are shown below:

Comparison with SOA

They also compared their method with the state of the art method. The result is shown below:

The last two methods performed better than the proposed method. In the paper, they mentioned that it's because they used a bigger network in the classification part and it also makes them not straight forward to interpret.

Explainable classifier

The table below shows the formula of each classifier, wights and input feature. The weight could be interpreted like this. The positive weight means that the feature positively affects the disease. The negative weights mean that the input feature negatively affects the result. So if weight is negative, it means the bigger value the less possibility of this disease.

Ablation study

The table shown below is the result of ablation study.

First, they tested the inverse order of classification. So first the most difficult diagnostic disease was classified and at the last, the easiest diagnostic disease was classified. Then they tired different classifiers. As next, they input all nine features into each classifier with the classifier they tested on the last step. As of last, they tested a single step classification so input nine features and output scores of five classes. We can see that none of the tested cases outperforms the purposed method.

Conclusion and Discussion

Advantage

The method is interpretable which is very essential especially in the medical application where the data transparency plays a very important role. Also, make the network explainable is also important to gain the trust of the doctor and patient. The ApparentFlow network is very powerful to extract motion related features. They also used the LVRV-net to segment the slice from the frame other than the ED and ES frame. However, the results show inconsistency, while the Network is only trained on the ES and ED frame.

Drawback

The method is limited at generalization since only 20 cases for each disease is available and also because the disease description is the inconsistency. So the different dataset describes each disease slightly different.

Reference

https://www.researchgate.net/figure/Sample-short-axis-stack-volume-acquisitions-at-end-diastole-in-a-healthy-subject-Two-top_fig1_7858641 19/11/2019
Zheng, Qiao & Delingette, Hervé & Ayache, Nicholas, published in Medical Image Analysis, Volume 56, August 2019, Pages 80-95
Isensee, F., Jaeger, P., Full, P., Wolf, I., Engelhardt, S., Maier-Hein, K., 2017. Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain-specific features, in Proc. Statistical Atlases and Computational Models of the Heart (STACOM), ACDC challenge, MICCAI’17 Workshop
Wolterink, J., Leiner, T., Viergever, M., Isgum, I., 2017. Automatic segmentation and disease classification using cardiac cine MR images, in Proc. Statistical Atlases and Computational Models of the Heart (STACOM), ACDC challenge, MICCAI’17 Workshop.
Cetin, I., Sanroma, G., Petersen, S., Napel, S., Camara, O., Ballester, M., Lekadir, K., 2017. A radiomics approach to computer-aided diagnosis with cardiac cine-MRI, in Proc. Statistical Atlases and Computational Models of the Heart (STACOM), ACDC challenge, MICCAI’17 Workshop.
Khened, M., Alex, V., Krishnamurthi, G., 2017. Densely connected fully convolutional network for short-axis cardiac cine MR image segmentation and heart diagnosis using random forest, in: Proc. Statistical Atlases and Computational Models of the Heart (STACOM), ACDC challenge, MICCAI’17 Workshop.
Khened, M., Alex, V., Krishnamurthi, G., 2018. Fully convolutional multiscale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. arXiv preprint arXiv:1801.05173.
Zheng, Q., Delingette, H., Duchateau, N., Ayache, N., 2018. 3D consistent and robust segmentation of cardiac images by deep learning with spatial propagation. IEEE Trans Med Imaging 37(9), 2137–2148.

.

Seitenhierarchie