Introduction

Graphs have been used for a long time to model pairwise relations between objects and effectively solve related problems. Specifically, graphs introduce a natural representation of complex data types such as citation networks (e.g., CiteSeer) (Giles et al., 1998), social networks (e.g., ego-twitter) (Rossi and Ahmed, 2015), molecules (e.g., Tox21 dataset) (Mayr et al., 2016; Huang et al., 2016), and point clouds. Recently, the field of graph representation has made striking advances and developed rapidly in modeling various medical datasets, such as brain connectivity networks (Bullmore and Sporns, 2009), image patch graphs (Sun et al., 2021), vessel graphs (Paetzold et al., 2021), and operating room scene graph representation (Ozsoy et al., 2021). Effectively modeling such complex data is key to exploiting graph learning, which is proven to learn the topological characteristics and incorporate geometric information (Bronstein et al., 2017).

Methods to construct brain graph data from structural (T1-w/T2-w MRI) and functional magnetic resonance imaging (fMRI) modalities are rapidly expanding. Brain graphs not only represent the topological information in the brain but also encode pairwise relationships between different brain regions which is not easily done with conventional methods (e.g., convolutional neural networks). Particularly, encoding these relationships is of paramount importance for the biomarker detection task in medicine, because anomalies in these relationships usually are an indicator of certain diseases. To construct a brain graph, first, we need to parcellate the brain surface into different cortical regions and measure the characteristics of these regions (e.g., surface area, mean principal curvature, gaussian curvature, cortical thickness, sulcal depth). (Fischl, 2012) proposes Freesurfer, an advanced and reliable neuroimaging tool that has been utilized for a decade, specialized for cortical parcellation tasks for the human brain, and provides region-level cortical measurements. Pairwise relations between the cortical regions are usually derived from the correlations, covariance matrix, or absolute differences between cortical measurements of the two regions, and each of these pairwise relations is an edge between two nodes on the brain graph. Since we have multiple measurements per region, we have multiple pairwise relations (i.e., graph edges) between the pairwise nodes, yielding a brain multi-graph (Fig. 1).

Figure 1. The process of generating a structural brain connectivity graph.

Another research direction to benefit from graphs in medical imaging is 3D image patch graphs. The main goal for patch graphs is to incorporate anatomical context, consisting of certain landmark points which are quite informative to clinicians (Sun et al., 2021). To create this type of graph, first, we need to register the original image to an atlas which is split into patches, and later to be mapped into the original image. However, these patches have different coordinates on the original image, due to the distortions occurring during the registration process. Such distortions on the image also cause overlapping patches, which later are represented as the edges on the patch graph, and nodes as the patch center (Sun et al., 2021) as shown in (Fig. 2).

Figure 2. The process of generating an anatomical 3D image patch graph for lung CT.

Background

A graph typically is composed of node features ( $\begin{array}{l}n_i\end{array}$ ) and edge features ( $\begin{array}{l}e_{i,j}\end{array}$ ), where $\begin{array}{l}i\end{array}$ is node index and $\begin{array}{l}j\end{array}$ is its neighbor's index. A multi-graph is defined as a graph with multiple edges connecting two pairwise nodes, yielding multiple adjacency matrices which can be stacked and stored in a tensor (i.e., a multi-dimensional matrix). When processing or learning from graph data, there are additional challenges due to different graph types, such as sparse graphs, multi-graphs, directed graphs, and graphs with disconnected nodes. One should take all these complications into account when designing a methodology. Therefore, the field of graph learning gained popularity and achieved breakthroughs to overcome various graph representation challenges. Recently, various methods were proposed to obtain meaningful outcomes from different graph types, such as graph classification (e.g., toxicity prediction) (Mayr et al., 2016; Huang et al., 2016), node classification (Kipf and Welling, 2016), link prediction (e.g., recommendation systems) (Ying et al., 2018), node clustering (e.g., community detection) (Girvan and Newman, 2002). To operate such tasks with graphs, a graph neural network may consist of graph pooling layers, graph normalization layers, and graph convolutional layers. The main idea behind graph convolutional operation is the message passing paradigm (Battaglia et al., 2016; Gilmer et al., 2017), performing a convolutional operation on a selected node ( $\begin{array}{l}n_i\end{array}$ ) to learn its relations between $\begin{array}{l}n_i\end{array}$ and its neighbors $\begin{array}{l}n_j\end{array}$ . A ´message´ is a set of learned features between pairwise nodes and often denoted by $\begin{array}{l}m_{i,j}\end{array}$ as shown in Fig. 3. Ideally, all local information ( $\begin{array}{l}n_i\end{array}$ , $\begin{array}{l}n_j\end{array}$ , and $\begin{array}{l}e_{i,j}\end{array}$ ) should contribute to the computation of the message $\begin{array}{l}m_{i,j}\end{array}$ , and introduce non-linearity usually by learning a neural network layer (Gilmer et al., 2017). Since there can be multiple edges and multiple neighbors, each of the nodes $\begin{array}{l}n_i\end{array}$ usually has multiple messages. Then, these messages are aggregated to the node level, yielding a final output of a latent matrix where each row is a vector that represents the learned attributes for the corresponding node. In Fig. 3, the aggregation function is given as average, but maximum, median, and sum are also widely used.

Figure 3. Message passing paradigm in graph convolutional operations.

In many years, machine learning field had a strong reliance on good quality labeling data. However, self supervised learning concept is advancing recently, promising to reduce over-reliance on the labeled data. The main goal is to obtain signals and features from data itself, by exploiting the underlying structure. Ideally, obtained signals are often expected to be similar to those learned in the supervised manner. As shown in Fig. 4, the ground truth should be derived from the training dataset, by various sample selection methods. The objective function in self supervised concept is generally to minimize the distance between two samples, since they belong to the same data distribution, meaning hypothetically similar. However, the learned representations with self-supervised learning still is not discriminative enough to accurately predict the ground truth (e.g., perform a diagnosis task). Therefore, there is a dependency to a disjoint classifier, usually trained with classic machine learning methods (e.g., logistic regression, random forest), and this additional training is often referred as 'downstream task'.

Figure 4. Self-supervised learning overview.

Contrastive learning, a popular type of self supervised learning, has gained a surge of interest. In this concept, the model is expected to learn representations from data by minimizing the distance between similar samples and maximize it between dissimilar samples. Similar samples in contrastive learning is defined as a 'positive pair' and for images they are generated with task-specific augmentation methods (e.g., zooming, rotating, cropping) as shown in Fig. 5. Dissimilar samples is referred as a 'negative pair' and this pair basically comprises data coming from to different samples. However, the assumption of 'different samples are dissimilar' has many disadvantages since some samples in the dataset might be more similar to each other, especially when they belong to same supervised class (e.g., if both cat images). Thereby, temperature parameter ( $\begin{array}{l}\tau\end{array}$ ) is proposed, a hyperparameter to adjust the penalties given to the negative pairs, prevents overfitting. Moreover, the contrastive learning concept is more challenging in graph learning than image processing, particularly defining the positive pair is not always clear, and usually difficult to define augmentations for graphs.

Figure 5. Contrastive learning overview.

Related Work

1. Context Aware Self-supervised Learning for Medical Images Using Graph Neural Network (Sun et. al., 2021)

1.1. Main Idea

Conventional image processing methods have proven to be affective on many tasks such as image classification, semantic segmentation, object detection, image-to-image translation, and leveraged convolutional neural networks for a long time. However, convolutional operations on images has many drawbacks, locality (i.e., cannot learn from the dependencies between different regions on images), rotation invariance, and size invariance. A series of methods have been proposed to mitigate these drawbacks, mainly enriching the data by augmentations (e.g., zooming, affine, cropping, padding, rotating). These approaches not only increase the computational cost, but also unforeseeably affect the model performance and robustness, which is undesirable in medical computing (Weihsbach et. al., 2022). More importantly, none of them can capture the morphological features of an organ (e.g., anatomical landmarks) and does not incorporate anatomical information of the patient. Therefore, inspecting the relations between different regions within an organ is of paramount importance for clinicians. Graphs have certain advantages to better represent these relationships and encode the 'anatomical context'. To obtain such graphs, authors used an atlas registration method and a patch mapping function that first split the atlas into patches and then transfer the coordinates to the transformed image. Due to distortions occurring on the transformed image, the patches may have overlapping areas. Based on this concept, a patch graph is constructed by connecting the centers of two overlapping patches, yielding a graph edge.

1.2. Methodology

The objective of learning an anatomical context with patch graphs is achieved by leveraging contrastive learning. Authors propose a hierarchical model with two levels (Fig. 6): one to learn patch-level representations with a conditional encoder, and one to learn subject-level anatomical representations with graph convolutional layers. For the graph convolutional layers GCNConv method is implemented from PyTorch-Geometric proposed in (Kipf and Welling, 2016). Both patch-level and subject-level features are optimized with separate objective functions. Positive pair is given as a random augmentation of the image, and negative pair is given as an image from a different sample. Authors adopt InfoNCE loss (Oord et. al., 2018) to optimize similarities and dissimilarities both in patch-level and subject-level. Finally, the learned features are passed to logistic regression model, a disjoint classifier for the disease classification as a downstream task.

Figure 6. Context-aware representation learning model overview.

1.3. Experiments and Results

Authors evaluated their method on 3 different datasets:

Chronic Obstructive Pulmonary Disease (COPD) dataset which includes chest CT images from patients with a lung disease (COPD) causing breathing difficulties. This dataset is relatively a large dataset with 9180 patients. (Regan et. al., 2011)
COVID dataset collected from hospitals in Moscow. This lung CT dataset is collected specifically to identify COVID-related abnormalities on 1110 patients. There are 5 levels severity of lung damage is reported. (Morozov et. al., 2020)
A separate COVID dataset created by authors collected from multiple hospitals. This is a mixed dataset with somewhat balanced classes 45 healthy - 35 positive, total of 80 lung CT images.

For COPD dataset, authors measured multiple metrics to evaluate the effectiveness of their model and benchmarked against well-known supervised and unsupervised models including K-means, MedicalNet, and Subject2Vec. The evaluation includes two continuous metrics: logFEV1pp and logFEV1/FVC which are an indicator for lung respiratory capacity of the patient. R-square is reported between the measured lung capacity and the model prediction (Table 1). Since this dataset provides different lung damage severity categories, authors performed a downstream classification task to predict the lung damage severity in 6 different measurements: GOLD, CLE, para-septal, AE history, future AE, mMRC, then reported the accuracy in Table 1.

Table 1. Results on COPD dataset.

For MosMed dataset, authors benchmarked against 1 supervised and 3 unsupervised methods, then reported accuracy on the COVID diagnosis task as reported in Table 2.

Table 2. Results on MosMed dataset.

For the third dataset, authors evaluated their model which is pretrained on the other two datasets. Since MosMed dataset includes more COVID-related information, reported accuracy is relatively higher than the model pretrained on the COPD dataset.

Table 3. Results on COVID dataset.

2. Deep Graph Normalizer (Gurbuz and Rekik, 2020)

2.1. Main Idea

A connectional brain template (CBT) is holistic representation of a brain network population, and ideally located in 'center' of the population, minimizing the total distance to samples - also regarded as an ‘average’ brain graph (Fig. 7). Particularly, learning such a representation for graphs is a challenging task, due to underlying topological characteristics of the multi-edged heterogenous graphs. Maintaining the topological information is essential and cannot be achieved by conventional approaches. Therefore, leveraging graph neural networks has particularly advantageous to learn better representations. Moreover, connectional brain templates are effectively used in biomarker detection task, so they are ideally discriminative between two populations with different brain states (e.g., healthy and disordered) as demonstrated in Fig. 7. Thereby, one can identify distinctive regions and connectivities associated with certain diseases within the brain, by comparing the CBTs learned from two different populations.

Figure 7. Properties of connectional brain templates.

2.2. Methodology

Training deep graph normalizer is also performed in self-supervised manner. To learn a connectional brain template, authors pass the brain graph to the graph convolutional layers which is composed of 3 NNConv layers proposed in (Gilmer et. al., 2017). Graph convolutional layers output a latent representation of the graph, as a vector with the same length as number of nodes in the graph. Then, authors construct an adjacency matrix by taking pairwise absolute differences between learned node attributes. Since a brain graph is a fully connected, undirected graph, the result is a symmetric matrix that represents the connectional brain template on subject-level (or batch-level). Then, another batch from the dataset is randomly selected to calculate the subject-level loss between the selected subjects and the refined CBT. This loss is calculated with a distance measure based on Frobenius norm, which is proven to be effective on non-Euclidean domain.

Figure 8. Deep graph normalizer model overview.

2.3. Experiments and Results

Authors evaluated their model on two different datasets:

Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, including a total of 77 patients with two brain states (i.e., diagnosis classes): 41 patients diagnosed with Alzheimer's Disease (AD) and 36 patients with Late Mild Cognitive Impairment (LMCI) which are two conditions clinically difficult to distinguish one from another. (Mueller et. al., 2005)
Autism Brain Imaging Data Exchange I (ABIDE I) dataset, including a total of 310 patients with two brain states (i.e., diagnosis classes): 155 patients with Autism Spectrum Disorder (ASD) and 155 patients with Normal Control (NC). (Di Martino et. al., 2014)

Moreover, both datasets include subjects with 4 different cortical measurements, namely cortical thickness, mean principal curvature, maximum principal curvature, sulcal depth. Thereby, the graphs have 4 different edge features, yielding 'brain multi-graph'. The reported values are also measured separately for left and right hemispheres of the brain for both datasets.

An ideal connectional brain template should have two important properties as discussed above: well-centeredness and discriminativeness. To evaluate how centered the learned CBT is authors measured the mean Frobenius distance between the predicted CBT and the all subjects in the test set, plus they statistically prove their model outperforms the benchmark by reporting the p-value (Fig. 9). Since deep graph normalizer is a pioneering method (i.e., first method that adapts graph convolutions) in graph normalization task, the mentioned benchmark method in Fig. 9 is a non-graph convolutional method proposed in (Dhifallah and Rekik, 2020). As demonstrated in Fig. 9, the graph convolutional method introduces a significant gain against the state-of-the-art.

Figure 9. Centeredness evaluation of deep graph normalizer against the non-graph convolutional benchmark netNorm (Dhifallah and Rekik, 2020)

Another evaluation criteria of connectional brain templates is the discriminativeness. To perform such evaluation, authors trained a disjoint multi-kernel support vector machine model as a separate classification task, and then evaluated the overlap between most discriminative regions detected by the disjoint classifier and the CBT-based learners as reported in Table 4.

Table 4. Overlap rate between detected regions with CBT-based methods and the multi-kernel support vector classifier.

3. Contrastive Functional Connectivity Graph Learning (Wang et. al., 2022)

3.1. Main Idea

Authors in this paper try to solve intra-class variance problem, an unprecedented challenge in machine learning. In medical computing, different groups of diseases and patients which affects the human body on varying scale. Assume there are 2 different brain populations, one consisting of patients with particular disease and another is a healthy population. There are many neurological diseases affect brain differently for each patient, causing high variance within the disease group (Fig. 10). Therefore, diagnosis of this type of diseases is not straightforward by using medical imaging techniques only. Another notorious problem in medical imaging is having only small datasets available for rare diseases such as schizophrenia and amyotrophic lateral sclerosis (ALS). Such small datasets also have high intra-class variance and too few samples, to train a machine learning model becomes notoriously challenging.

Figure 10. Overview of intra-class variance problem.

3.2. Methodology

Contrastive learning is rapidly becoming a popular type of self supervised learning. However, performing a contrastive learning task on graphs is not as straightforward as on images due to difficulty of preserving topological information in graph augmentation methods. Therefore, authors propose a promising way to implement a contrastive graph learning (CGL) model, creating positive pairs by selecting different edge features from the same subject instead of performing a graph augmentation (Fig. 11). Then, authors aim to learn well-representative set of features for each patient, mainly a vector containing a learned attribute for each of the node for each patient. To obtain such features, authors implemented a graph neural network with Chebyshev graph convolutions (Defferrard et. al., 2016).

Figure 11. Contrastive learning for graphs.

Additionally, authors propose dynamic graph classifier (DGC), a disjoint model for their downstream classification task, exploiting dynamic edge convolutions (Wang et. al., 2019). First, DGC model generates the learned attribute vector with graph convolutional layers for the test subject. Next, authors measure similarities between patients by constructing a KNN graph with top-K algorithm. Then, the test subject is classified based on the KNN distances to different groups.

Figure 12. Overview of dynamic graph classifier model (DGC)

3.3. Experiments and Results

To evaluate their model authors used one dataset, namely attention deficit/hyperactivity disorder (ADHD-200) dataset from Preprocessed Connectome Project (PCP) under International Neuroimaging Datasharing Initiative (INDI) organization (Bellec et.al., 2017). Dataset includes resting-state functional magnetic resonance images (rf-MRI) of 596 subjects with two different diagnosis attention deficit hyperactivity disorder (ADHD) and normal control (NC), collected from 3 different data collection sites. The dataset is separated into train-validation-test splits with 7:1:2 ratio, and classes are balanced in all splits.

After training the contrastive graph learning model, authors evaluated discriminativeness of the learned features by each model on a downstream classification task. Benchmark models are already fully supervised, and CGL (Variant 2) is evaluated with a disjoint KNN classifier trained on the learned attributes of CGL, because CGL is self-supervised and does not predict the class labels. As shown in Table 5, classifier uses CGL-based learned features impressively outperforms the fully supervised models.

Table 5. Classification results of the downstream task performed with learned features from different models on 3 data collection sites.

References

Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998, May). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on Digital libraries (pp. 89-98).

Rossi, R., & Ahmed, N. (2015, March). The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).

Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science, 3, 80.

Huang, R., Xia, M., Nguyen, D. T., Zhao, T., Sakamuru, S., Zhao, J., ... & Simeonov, A. (2016). Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Frontiers in Environmental Science, 3, 85.

Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience, 10(3), 186-198.

Paetzold, J. C., McGinnis, J., Shit, S., Ezhov, I., Büschl, P., Prabhakar, C., ... & Menze, B. H. (2021). Whole brain vessel graphs: a dataset and benchmark for graph learning and neuroscience (vesselgraph). arXiv preprint arXiv:2108.13233.

Özsoy, E., Örnek, E. P., Eck, U., Tombari, F., & Navab, N. (2021). Multimodal semantic scene graphs for holistic modeling of surgical procedures. arXiv preprint arXiv:2106.15309.

Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4), 18-42.

Fischl, B., 2012. Freesurfer. NeuroImage 62, 774–781.

Sun, L., Yu, K., & Batmanghelich, K. (2021, May). Context matters: Graph-based self-supervised representation learning for medical images. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 6, pp. 4874-4882).

Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., & Leskovec, J. (2018, July). Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 974-983).

Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the national academy of sciences, 99(12), 7821-7826.

Battaglia, P., Pascanu, R., Lai, M., & Jimenez Rezende, D. (2016). Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017, July). Neural message passing for quantum chemistry. In International conference on machine learning (pp. 1263-1272). PMLR.

Weihsbach, C., Hansen, L., & Heinrich, M. (2022, November). XEdgeConv: Leveraging graph convolutions for efficient, permutation-and rotation-invariant dense 3D medical image segmentation. In Geometric Deep Learning in Medical Image Analysis (pp. 61-71). PMLR.

Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.

Regan, E. A., Hokanson, J. E., Murphy, J. R., Make, B., Lynch, D. A., Beaty, T. H., ... & Crapo, J. D. (2011). Genetic epidemiology of COPD (COPDGene) study design. COPD: Journal of Chronic Obstructive Pulmonary Disease, 7(1), 32-43.

Morozov, S. P., Andreychenko, A. E., Pavlov, N. A., Vladzymyrskyy, A. V., Ledikhova, N. V., Gombolevskiy, V. A., ... & Chernina, V. Y. (2020). Mosmeddata: Chest ct scans with covid-19 related findings dataset. arXiv preprint arXiv:2005.06465.

Gurbuz, M. B., & Rekik, I. (2020). Deep graph normalizer: a geometric deep learning approach for estimating connectional brain templates. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VII 23 (pp. 155-165). Springer International Publishing.

Mueller, S. G., Weiner, M. W., Thal, L. J., Petersen, R. C., Jack, C., Jagust, W., ... & Beckett, L. (2005). The Alzheimer's disease neuroimaging initiative. Neuroimaging Clinics, 15(4), 869-877.

Di Martino, A., Yan, C. G., Li, Q., Denio, E., Castellanos, F. X., Alaerts, K., ... & Milham, M. P. (2014). The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry, 19(6), 659-667.

Dhifallah, S., Rekik, I., & Alzheimer's Disease Neuroimaging Initiative. (2020). Estimation of connectional brain templates using selective multi-view network normalization. Medical image analysis, 59, 101567.

Wang, X., Yao, L., Rekik, I., & Zhang, Y. (2022). Contrastive graph learning for population-based fmri classification. arXiv preprint arXiv:2203.14044.

Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29.

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2019). Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38(5), 1-12.

Bellec, P., Chu, C., Chouinard-Decorte, F., Benhajali, Y., Margulies, D. S., & Craddock, R. C. (2017). The neuro bureau ADHD-200 preprocessed repository. Neuroimage, 144, 275-286.

Seitenhierarchie

Self-supervised graph representation learning