Outline
Introduction
Parkinson's Disease
Previous Work
Research Objective
Methodology
Framework Structure
Input Data
Why GCN?
Region of Interest
Brain Geometry Graph
Brain Connectivity Graph
The essence of the Spectral Graph Convolution
View Pooling
Pairwise Matching
Softmax
Experiments and Results
PPMI
Setting
AUC
NMI
Results
Conclusion
My Review
Introduction
This blog post is summarising the paper "Multi-View Graph Convolutional Network and Its Applications on Neuroimage Analysis for Parkinson’s Disease" written and researched by Xi Zhang, Lifang He, Kun Chen, Yuan Luo, Jiayu Zhou and Fei Wang. Subsequently, I will refer to them as authors. In the reviewed paper the authors propose a framework that can help to identify Parkinson’s disease by utilising a Multi-View Graph Convolutional Network.
Parkinson’s Disease (PD)
First, I would like to introduce the Parkinson’s disease (PD). PD is a progressive nervous system disorder. The most noticeable and known symptoms are tremors (uncontrollable shaking of limbs), bradykinesia (slowness of movement) and balance difficulties. All in all, the impairment or even death of neurons that produce the hormone dopamine primarily in the substantia nigra cause PD. The deficiency then leads to the symptoms, as mentioned before. Currently, there is no definit cure for PD. But there are palliative treatments such as Levodopa or Deep Brain Stimulation. Also, at the current moment, there is no specific test for PD. It is diagnosed by a neurologist based on medical history, neurological and physical examination.
Previous Work
Consequently researchers try to find new ways to diagnose PD. One of the conducted studies by Schrag et al. was to ascertain whether it is possible to predict cognitive impairment in patients with newly diagnosed Parkinson’s disease, by analyzing clinical variables and biomarkers 1. Another example is a study led by the authors of the reviewed paper. This study was aimed to identify subtypes of PD by using a sequential deep learning-based approach 2.
Research Objective
Even though there has been a numerous amount of computational studies, only a few of these were based on neuroimaging data. Hence the authors of the reviewed paper propose a framework that detects relationships between patients with PD, utilising neuroimaging data by using a Multi-View Graph Convolution Network.
Methodology
Framework Structure:
The proposed framework consists of 3 main components. First the core component Multi-View Graph Convolutional Network itself, second the pairwise matching strategy, and lastly a softmax predictor.
Input Data
As input data, the authors picked Magnetic Resonance Imaging (MRI) scans to represent brain structure and Diffusion Tensor Imaging (DTI) scans to represent brain connectivity. Since in the past both have been a suitable medium to detect PD-like patterns. For example, Chen et al. was able to observe that PD patients suffer from a volumetric loss of the olfactory bulbs, when comparing diagnosed patients control MRI scans to healthy examples. 3. Also, the Decreased Fractional Anisotropy in the substantial nigra has been frequently spotted in PD patient’s DTI scans 4. (For further biomarkers check out 5)
Why GCN?
Because the brain does not have an underlying grid-like structure, one cannot use a standard Convolutional Network to learn its features. Therefore the authors chose to divide the brain into regions of interest and depict these as nodes on a brain graph. This gives us the Graph Convolutional Network (GCN). So what makes this a Multi-View GCN? The term Multi-View GCN stems from the multiple views that are used as input for the GCN. Each of these views is a result of running one particular tractography algorithm on a DTI scan. Essentially multiple views are used to encapsulate all information from different "angles" into one. For further clarification see Figure 2.
Broad workflow of the core MVGCN component:
We start with an acquisition of a patient which consists of one MRI scan and M views of the DTI scan. The first step is graph construction and the second step is feature construction. Here two Graphs are constructed. One named Brain Geometry Graph (BGG), that represents the brain's spatial information and the Brain Connection Graph (BCG), which represents the connectivity between voxels and thus its features.
By LarrabeeMGH - Own work, CC BY-SA 4.0
Region of Interest
Before starting with the graph-structure let us clarify what is meant by region of interest first. ROIs are brain regions with a certain purpose. For example, the Lateral Occipital Lobe region is responsible for visual perception. In the framework, the scans are parcelled into 84 ROIs by using a software named FreeSurfer.
Brain Geometry Graph
The Brain Geometry Graph’s form is immutable and is shared among all acquisitions. Each node represents a region of interest. Edges are bidirectional and defined with the help of the K-NN graph algorithm. This means nodes e and j are connected, if j is in the set of e’s k nearest nodes and vice versa. Lastly, the weights are determined by the gaussian similarity function over the average geometric distances of all nodes in each acquisition. Internally it is depicted as an adjacency matrix. Formally entries of the adjecency matrix A are defined as such:
\[ {\tilde{a}}_{i,j}=\{\begin{array}{ll}e({\upsilon }_{i},{\upsilon }_{j})\hfill & \text{if}{\upsilon }_{i}\in {N}_{j}\text{or}{\upsilon }_{j}\in {N}_{i}\hfill \\ 0,\hfill & \text{otherwise.}\hfill \end{array} \]
Ni consists of the k nearest nodes to node vi , analogously for vj.
Gaussian Similarity function:
e({\upsilon }_{i},{\upsilon }_{j})=\mathrm{exp}(-\frac{\left|\right|{\upsilon }_{i}-{\upsilon }_{j}|{|}^{2}}{2{\sigma }^{2}})
Brain Connectivity Graph
For each acquisition, there are M BCGs. Each of them corresponds to a view that is outputted by a tractography algorithm run on the DTI scan. Same as for the BGG, the nodes on the Brain Connectivity Graph represent ROIs and the edges equate the connectivity strength between the nodes. Internally each of the views are portrayed by a similarity matrix that is equivalent to our input feature matrix.
The essence of the spectral graph convolution
To convolve on graphs one has to follow these steps:
First a Laplacian matrix has to be defined on the graph you want to convolve on. The Laplacian is of the form \[\text{L = D - A}\] where D is a degree matrix and A an adjancency matrix. In our case the BGG. Second, one has to find the eigendecompostion of L, that will serve as Graph Fourier Basis \[\text{UU}^{\text{T}}\]. Then the signal x needs to be extracted from the Feature vectors in our case out of the BCG similarity Matrix and later on projected into the new basis defined by the Laplacian Eigenvectors. So that the filter g can be learned.
\[ \text{y}={g}_{\theta }(\text{L})\text{x}={g}_{\theta }(\text{U}\Lambda {\text{U}}^{\text{T}})\text{x}=\text{U}{g}_{\theta }(\Lambda ){\text{U}}^{\text{T}}\text{x,} \]
But in reality the eigendecomposition and evaluation of the first equation are very expensive computations. Therefore the authors decided to utilise an approximation with the help of Chebyshev Polynomials.
\[ {g}_{\theta }(\Lambda )={\displaystyle {\sum }_{p=0}^{s-1}{\theta }_{p}{T}_{p}(\tilde{\Lambda })}, \]
Resulting operation will look like this:
\[ {\text{y}}_{j}={\displaystyle {\sum }_{i=1}^{{F}_{in}}{g}_{{\theta }_{i,j}}(L){\text{x}}_{i},} \]
Where both \[{\text{x}}_{i}}\] and \[{\text{y}}_{j}}\] correspond to the rows of an input matrix or rather output matrix.
View Pooling
As this operation is done on all M views. We need to aggregate all of the feature matrices by utilizing the element-wise maximum operation. Resulting in one joint feature matrix that summarises all views.
Pairwise Matching
Because of MVGCN is very data-hungry and the amount of available neuroimaging data is scarce. The authors chose to take the approach of pairwise matching instead of sample matching. Meaning that the framework compares all the possible pairs of acquisitions and returns the probability if they belong to the same class or not. This way, it is possible to “stretch” out the data set 6. Similarity between acquisitions can be a helpful metric to define a pairwise relationship 7. The authors decided to use the similarity function to aggregate two acquisition feature matrices into one similarity feature matrix.
\[ \begin{array}{cc}\text{sim}({\text{z}}_{p}^{i},{\text{z}}_{q}^{i})={\text{z}}_{p}^{i}{}^{T}{\text{z}}_{q}^{i},& i=1,2,\cdots ,n.\end{array} \]
Softmax:
Final step is to run it through an objective activation function.
\[ p(y=j|\text{r})=\mathrm{exp}({\text{w}}_{j}^{T}\text{r})/\left[{\displaystyle {\sum }_{k=1}^{K}\mathrm{exp}({\text{w}}_{k}^{\text{T}}\text{r)}}\right], \]
Experiments and Results
PPMI
The clinical input data was provided by the Parkinson Progression Markers Initiative. The Objective of PPMI is to aid breakthroughs in identifying biomarkers for Parkinson’s disease. Thus they provide funding and its biorepository on PD for research.
Setting
- The used training data is consisting of 754 aquisitions. Out of these 596 had Parkinson's disease and 158 were healthy.
- Because of the Pairwise matching there are 283 881 possible pairings out of which are 189 713 matching pairs and 94 168 not.
- To train and test the framework 5 fold cross validation was used.
- The network is trained by applying back propagation and stochastic optimisation.
- The order of Chebyshev Polynomials was 30 and feature dimension 128 x 128.
- As for the different views following tractography algorithms were used (for each acquisition):
- Fiber Assignment by Continious Tracking
- Second order Runge-Kutta
- Interpolated Streamline
- Tensorline
- Orientation distribution function-based deterministic approach
- Hough voting
Area under the Curve (AUC)
AUC is Metric used to evaluate results of a binary classifier. It compares the true post positive rate and the false positive rate which results in a curve named ROC. AUC is the area below ROC. 8
Normalized Mutual Information (NMI)
Normalized Mutual Information is a metric for evaluation of clustering mechanisms. So it computes whether each clustering contains only one class label and whether it is a clear split between those clusters. 9
Results
Fig. 4 (click here for a better view)
Conclusion
The paper proposes a new and purely computational approach to analyzing neuroimaging data to detect Parkinson’s disease. Because of the brain's non-grid-like structure, the authors came up with the idea to create a generalization into a graph structure. Subsequently, spectral graph convolution was applied. By comparing similarities of its acquisition in pairs the authors were able to satisfy the data-hungry Multi-View Convolutional Network, resulting in an accurate classification of acquisitions. Furthermore, finding interesting feature patterns without the use of explicit clinical data. Since this framework seems rather universal there is a possibility for it to be used to detect other neurological diseases.
My Review
In my personal opinion this paper tackles a very difficult and relevant task. While reading I was especially fond of the precise and clear structure of the paper. Which helped me to better understand the complex subject at hand. Same goes for the provided graphics, that visually emphasise the important concepts. Even though my general perception is positive there are still some points of critique to be made. A small drawback of this framework would be the use of pairwise matching, which means that when using this framework to identify PD one always needs a dummy acquisition that is classified as PD or HC as a prerequisite. Another would be that the used Loss function was never mentioned in the entire paper. Evenhough, it is a crucial part of neural network training. Furthermore, regarding the decision between mean-pooling and max-pooling, it was not clear, that max-pooling was superior. While, the AUC value max-pooling was higher, the deviation for it was higher as well. So it would have been beneficial to see more emphasis on this topic in terms of a statistic. Also, sadly there is no trace of time complexity analysis on the framework. It would have been very interesting to see the trade-offs between time and accuracy when comparing a single view GCN and MVGCN. Lastly, the evaluation of the results was lacking following aspects. Firstly the results were only compared to other approaches instead of other similar studies. Secondly, the interpretation of the results was not precise. The sentence: “The observations demonstrate that the learned pairwise feature vectors are consistent with some clinical discoveries…”, which should have been at least further investigated or reformulated and stated as a downside of the framework, was especially notable. Still, I was very pleased to review this paper. It provided me with important insights into this particular field and how neuroimaging analysis can be approached.
References
1. Anette Schrag, Uzma Faisal Siddiqui, Zacharias Anastasiou, Daniel Weintraub, Jonathan M Schott. Clinical variables and biomarkers in prediction of cognitive impairment in patients with newly diagnosed parkinson’s disease: a cohort study. The Lancet Neurology. 2017;16(1):66–75.
2. University of california san francisco and weill cornell medicine researchers named winners of 2016 parkinson’s progression markers initiative data challenge. https://www.michaeljfox.org/foundation/publication-detail.html?id=625&category=7.
3. Shun Chen, Hong-yu Tan, Zhuo-hua Wu, Chong-peng Sun, Jian-xun He, Xin-chun Li, Ming Shao. Imaging of olfactory bulb and gray matter volumes in brain areas associated with olfactory function in patients with parkinson’s disease and multiple system atrophy. European journal of radiology. 2014;83(3):564–570.
4. Claire J Cochrane, Klaus P Ebmeier. Diffusion tensor imaging in parkinsonian syndromes a systematic review and meta-analysis. Neurology. 2013;80(9):857–864.
5. Usman Saeed, Jordana Compagnone, Richard I Aviv, Antonio P Strafella, Sandra E Black, Anthony E Lang, Mario Masel-lis. Imaging biomarkers in parkinsons disease and parkinsonian syndromes: current and emerging concepts. Translational neurodegeneration. 2017;6(1):8.
6. Gregory Koch. 2015. Siamese neural networks for one-shot image recognition.
7. Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert. Distance metric learning using graph convolutional networks: Application to functional brain networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer; 2017. pp. 469–477.
8. Machine Learning Course by Google: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
9. Northeastern University Khoury College of Computer Sciences, Normalized Mutual Information, https://course.ccs.neu.edu/cs6140sp15/7_locality_cluster/Assignment-6/NMI.pdf
10. Rui Gao, Guangjian Zhang, Xueqi Chen, Aimin Yang, Gwenn Smith, Dean F Wong, Yun Zhou. Csf biomarkers and its associations with 18f-av133 cerebral vmat2 binding in parkinsons diseasea preliminary report. PloS one. 2016;11(10)