Prediction of Disease Progression in Multiple Sclerosis Patients using Deep Learning Analysis of MRI Data

This is the blogpost of the paper "Prediction of Disease Progression in Multiple Sclerosis Patients using Deep Learning Analysis of MRI Data", written by Adrian Tousignant, Paul Lemaitre, Doina Precup, Douglas L. Arnold and Tal Arbel.

Introduction

Multiple sclerosis (MS) is a disease that affects both the brain and the spinal cord. The nerves are surrounded by a myelin sheath, that helps to convey the information along the axon. Due to unknown reasons, the immune system attacks and damages the myelin sheath. As result of the damage, the electric signal transported in the nerves is interrupted, causing the symptoms of MS [1].

As first step towards the diagnosis of MS, blood tests and spinal fluid analysis are performed, in order to exclude other pathologies, and as final diagnosis the patient will have an MRI [2]. Due to the characteristics of the modality, MRI is able to "see" the lesions in the brain and in the spinal cord caused by MS (Figure 1).

The available treatments are only able to slow down the progression of the pathology.

Multiple Sclerosis can appear in various forms, depending on how the symptoms appear and how the disease progresses. In this paper, the so-called Relapsing-Remitting Multiple Sclerosis (RRMS) is considered. This type of MS is characterised by limited periods of time (called “relapses”) in which new symptoms appear and the disability dramatically increases. After the relapse, in the remission period, the new symptoms disappear completely or partially, and the disability is considerably reduced from the level during the relapse [3].

In medicine, different terms are used to describe the development of the disease:

The disease is “active” when a new lesion appears on the MRI scan or during a relapse;
The disease is “worsening” when the results on one examination are worse than the previous ones, possibly related to a relapse;
The disease is “progressing” when the results on one examination are worse than the previous ones, in a time frame in which the disease is not active [4].

The complexity of the pathology and the need for new treatments drives the research on the topic, in particular to find biomarkers able to give information about the stage of the disease. At the moment, few biomarkers are available, and they are not imaged-based. The goal of the paper is to predict the disease progression in order to offer to patients a personalized treatment, depending on the severity of the pathology, and to organize faster and cheaper trials, based on groups of patients with a similar form of the disease.

Related works

Overall, the research about the development of MS is wide, but most of the papers are addressing a different problem. For example, in [5], [6] and [7], MRI images are analysed using Deep Learning methods to predict the disease activity or the changing in the lesion. Of course, this kind of predictions are related to disease progression, but they are not exactly the same thing. On the other hand, in [8] the disability progression is predicted, which is a synonym of disease progression, but using clinical scores as input for Machine Learning methods as AdaBoost or Random Forest.

Methodology

The study aims to use a Neural Network to predict the disease progression using as input a set of MRI images, taken with different sequences. The task is then repeated, considering as input the MRI scans and additional information, in the form of lesion labels. As final part, the uncertainty measure on the result is evaluated. The authors hope that measuring the uncertainty of the network on the results will increase the clinicians’ trust on this method.

To predict the disease progression, a longitudinal problem needs to be modelled (Figure 2). The input of the proposed Convolutional Neural Network is a series of MRI images taken at the beginning of the trial (“baseline”), and the output is a binary prediction (p: "progressing" or "not progressing") on the status of the disease one year after the baseline. To train this kind of model, the ground truth is a Progression Label (PL) assigned by a physician to the baseline MRI. The status of the disease described by the PL is evaluated one year after the baseline considering the available MRI scans, and the Expanded Disability Status Scale (EDSS) scores collected during the trimestral follow-up visits.

During the training phase, the cross-entropy loss is used: $\begin{array}{l}L = -[PL\; log(p) + (1-PL)\; log(1-p)]\end{array}$

The proposed network consists in three convolutional blocks and two fully connected layers (Figure 3). Between each of them, there is a dropout layer. The convolutional blocks are based on Inception net [9], and are composed by four different pathways. Three of the pathways are composed only by 3D convolutional layers, and the last pathway is composed by a 3D max-pooling layer. The different design of the pathways allows the proposed CNN to learn features at different scales, thus considering both the lesion level and the brain structural level. Inside the convolution blocks, the max-pooling layer helps to propagate the information and the convolutional layers have size 3x3x3 to save memory and computation time. In the dropout layer, the Monte Carlo (MC) dropout is performed. As in the conventional dropout, the network is trained training small thinned networks [10]. The MC dropout consists in selecting also at test phase a thinned network. In this case, this method is used to evaluate the uncertainty on the result [11,12]. It is done performing 20 forward passes with the same input and storing the generated binary values. These values are in general different because a different thinned network is used to generate the each of them. The prediction on the input is then the average of the generated values, and the uncertainty measure is their variance.

In the internal layers, the activation function is ReLu, whereas in the output one, it is a sigmoid function.

The results given by the proposed network are compared with the ones given by a VGG-like 3D Convolutional Neural Network [13]. To generate the baseline network, a 3D version of the VGG model was modified in order to have approximately the same number of parameters of the proposed network.

Experimental set-up

The dataset is composed from MRI scans acquired during two clinical trials on patients with RRMS. In the paper, only patients who completed the trial and who were in the placebo arm of the trial were considered, in order to predict the real progression of the disease without the influence of the treatment. After the selection process, the cohort consists in 465 patients. During the first trial, the patients had one baseline MRI (at the beginning of the trial) and one at the end of the trial, after one year. In the second trial, the patients had three MRI scans at interval of six months. In both trials, the patients had trimestral follow-up visits, during which the physician evaluated the EDSS.

For each patient, the complete set of MRI scans at one time point consists in: T1 pre-contrast, T1 post-contrast, T2-weighted, Proton Density and FLAIR sequences. For some patients, a T2 lesion mask and a Gadolinium-enhanced lesion mask were available at the baseline. In the dataset, the class imbalance was a relevant problem because the non-progressive patients are almost 9 times more than the progressive ones. Therefore, the minority class was oversampled during training.

The dataset was split in the following way: 75% in the training set, 15% in the validation set and 10% in the test set.

The training took place on a hardware with V100 GPU and 16 GB of memory, and the RMSP Optimizer [14] was used to speed up the computation. Training for 100 epochs and with learning rate 1e^-5 took 10 hours in total. To cope with the small memory available, batches of size 2 were used.

As statistical evaluation, the K-fold cross validation was performed, using K=4 due to the size of the dataset.

During the training the early stopping technique was used to reduce the overfitting [15]. In this case, the training was stopped when the F-score started decreasing, after rising for a while. This metric was used because Precision and Recall are important parameters for this kind of use cases.

(1)	$\begin{array}{l}recall = \frac{TP}{TP+FN} \quad \quad \quad precision = \frac{TP}{TP+FN} \\F-score = \frac{precison*recall}{precision+recall}\end{array}$

Results

To assess the performance of the proposed network the Area Under the Curve (AUC) was used because it is considered to be robust to class imbalance, that heavily affects the dataset. The Receiver Operating Characteristic (ROC) curves used to measure the AUC are built using the True Positive Rate and the False Positive Rate (Figure 4).

The proposed network was evaluated when the only MRI images were fed as input and when the two lesion labels were considered as additional input (Table 1).

In both the cases, the proposed network performed better than the baseline, and providing as input the MRI and the lesion masks resulted in better performance than the only MRI.

The uncertainty was measured in the dropout layer at test phase. To evaluate the effectiveness of the measure, different ROC curves were built. For example, when the 90% most certain results were considered, the AUC was almost 5% higher than the one of the conventional ROC curve (Figure 5). In this case, the 10% most uncertain results are discarded, and the fact that the AUC increases means the uncertainty measure effectively detects the ambiguous results.

Final discussion & future works

In general, the paper is considered to be meaningful in the research on MS, in particular because the dataset is decently big and the results for the addressed problem seem to be promising. However, the proposed methodology is not really innovative, and some part of the study could be conveyed in a better way [16].

Moreover, it is not clear why the AUC was not considered as a stopping criterion for the early stopping, if, at the end, that is the metric to evaluate the method. In addition to that, the choice of the baseline network is not the best possible. Indeed, the intrinsic characteristics of the original VGG model and of original Inception net ensure that the latter performs better than the first, so the final results are not surprisingly.

As future works, the authors stated that alternative ways to evaluate uncertainty should be explored and longitudinal clinical information as age and disability stage should be considered as input. Finally, they hope that an accurate analysis of the prediction could help in the discovering of new biomarkers for MS.

References

[1] What is MS?, https://www.nationalmssociety.org/What-is-MS. Last access: 5/12/2019

[2] Diagnosing MS, https://www.nationalmssociety.org/Symptoms-Diagnosis/Diagnosing-MS. Last access: 5/12/2019

[3] The 4 types of MS, https://www.multiplesclerosis.com/us/treatment.php. Last access: 5/12/2019

[4] Part I: Understanding Progression in MS, https://mymsaa.org/publications/understanding-progression-ms/part-i-understanding-progression/. Last access: 10/12/2019

[5] Y. Yoo, L.W. Tang, T. Brosch, D.K.B. Li, L. Metz, A. Traboulsee, and R. Tam. Deep Learning of Brain Lesion Patterns for Predicting Future Disease Activity in Patients with Early Symptoms of Multiple Sclerosis, G. Carneiro et al. (Eds.): LABELS 2016/DLMIA 2016, LNCS 10008, pp. 86–94, 2016.

[6] N. M. Sepahvand, T. Hassner, D.L. Arnold and T. Arbel. CNN Prediction of Future Disease Activity for Multiple Sclerosis Patients from Baseline MRI and Lesion Labels, A. Crimi et al. (Eds.): BrainLes 2018, LNCS 11383, pp. 57–69, 2019.

[7] R. McKinleya, L. Grundera, R. Wepfera, F. Aschwandena, T. Fischerf, C. Friedlid, R. Muria, C. Rummela, R. Vermab, C. Weisstannerc, M. Reyese, A. Salmend, A. Chand, R. Wiesta, F. Wagnera. Automatic detection of lesion load change in Multiple Sclerosis using convolutional neural networks with segmentation confidence, arXiv:1904.03041, 2019

[8] M.T.K. Law, A.L. Traboulsee, D.K.B. Li, R.L. Carruthers, M.S. Freedman, S.H. Kolind and R. Tam. Machine learning in secondary progressive multiple sclerosis: an improved predictive model for short-term disability progression, Multiple Sclerosis Journal— Experimental, Translational and Clinical, October–December 2019

[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.

[10] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014)

[11] Y. Gal, Z.Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48

[12] Y. Gal. What my deep model doesn't know..., http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html. Last access: 15/12/2019

[13] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

[14] G. Hinton. Neural networks for machine learning. Coursera, 2016.

[15] Prechelt L. (1998) Early Stopping - But When?. In: Orr G.B., Müller KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg

[16] Prediction of Progression in Multiple Sclerosis Patients, https://openreview.net/forum?id=rkliARLel4. Last access: 15/12/2019

Seitenhierarchie