5: Brain Age Estimation From MRI Using Cascade Networks With Ranking Loss

Jian Cheng ,Member, IEEE, Ziyang Liu , Hao Guan , Zhenzhou Wu , Haogang Zhu, Jiyang Jiang,Wei Wen, Dacheng Tao ,Fellow, IEEE, and Tao Liu.

Blog post written by: Zaid Efraij

Introduction

The aging process of the human brain is known to be biologically complex [1]. As humans age, the aging process of the brain proceeds at different rates and could sometimes be hijacked by underlying neurodegenerative diseases. For those reasons, it is crucial to identify biomarkers that can predict age-related deterioration and cognitive decline. Structural and functional changes in the brain usually occur long before cognitive decline sets in, making them potential indicators of age-related diseases. One biomarker in that context is the brain age. The brain age is an estimation of the chronological age using just the neuroimaging features. Deviations of the brain age from chronological age- also the called brain age gap- could potentially serve as a sign of age-related cognitive decline. The concept of "brain age gap" has gained attention, providing insights into accelerated or decelerated brain aging in individuals and facilitating disease risk screening. Moreover, understanding brain aging and its associated biomarkers holds promise for early diagnosis and intervention in neurodegenerative disorders like Alzheimer's disease [2] and mild cognitive impairment [3].

Current Approaches and Limitations

Magnetic resonance imaging (MRI) has emerged as a non-invasive and sensitive tool for detecting these changes. State-of-the-art deep neural network-based machine learning methods have gained traction in estimating "brain age" from MRI data, offering a high-dimensional regression model for predicting chronological age. However, current approaches have some limitations. For instance, they do not explore feature maps from different scales. Furthermore, most methods only consider the loss between estimated and true ages of individual samples, without using any ranking loss for a set of samples. A good estimation, however, should also show a low ranking loss for a batch of samples. Another limitation observed in most methods is that they do not consider biological sex labels of MRIs, while males and females have different brain structures and age differently [4].

Contributions

To tackle some limitations of current architectures, the authors propose a novel 3-dimensional convolutional neural network (CNN) architecture with ranking losses for brain age estimation. The architecture is composed of a two-stage cascaded network, called two-stage-age-network (TSAN). Specifically, the first-stage network estimates an approximation of the brain age, and the second-stage network estimates a more accurate brain age based on the discretized brain age estimated by the first-stage network. Furthermore biological sex label is used as an input for the two networks. To combine feature maps with different scales, a novel scaled dense (Scaled-Dense) network architecture- inspired by DenseNet [5]- is used in both stages. For better estimation, two ranking losses, besides the normally used mean square error (MSE) loss, are proposed for regularizing the training process.

In brain age estimation, an observed bias frequently occurs: brain age is often overvalued in younger subjects and under estimated in older subjects, while brain age for participants with an age closer to the mean age are predicted more accurately [6]. For that reason, the authors apply a statistical bias correction to age estimation or brain age gap estimate based on a linear regression model. Furthermore, the authors use brain age gap as the only input variable to classify healthy control subjects, MCI and AD patients by using support vector machine (SVM). This is done to demonstrate applications of brain age estimation.

Proposed Architecture

The proposed TSAN architecture is composed by cascading to network stages together. The first stage network provides a rough estimation of brain age using a sex label and an MRI. Subsequently, the second network refines this estimation by using not only the sex label and MRI, the discretized brain age obtained from the first network. The architecture of TSAN can be illustrated in the following figure:

Fig. 1: Two-Stage-Age-Network (TSAN).

TSAN uses $\begin{array}{l}\delta_d\end{array}$ to discretize the estimated brain age in the first stage. Let $\begin{array}{l}\hat{y}\end{array}$ be the estimated brain age, and $\begin{array}{l}y\end{array}$ be the true chronological age. Then, the discretized brain age $\begin{array}{l}D(\hat{y})\end{array}$ is defined as:

$\begin{array}{l}D(\hat{y})= \begin{cases} Round(\frac{\hat{y}}{\delta_d}), \quad \delta_d > 0 \\ \hat{y}, \quad \delta_d = 0\ \end{cases}\end{array}$

where $\begin{array}{l}\delta_d\end{array}$ is a non-negative parameter for tuning the discretization degree, and $\begin{array}{l}Round(·)\end{array}$ denotes the round operator. Note that the first-stage network together with a positive $\begin{array}{l}\delta_d\end{array}$ could be seen as a multi-class classify which determines which age range the ground truth age $\begin{array}{l}y\end{array}$ belongs to.

ScaledDense Block

To extract information from different scales of feature maps, the authors propose a novel convolution block, called the ScaledDense block. Inspired by DenseNet [5], the densely connected paths in the ScaledDense block concatenate feature maps with different scale sizes from preceding layers by resizing (e.g. Max Pooling) and concatenation.

Fig. 2:The ScaledDense block, where feature maps from different scales are combined by using pooling and concatenation.

.

As seen in Fig. 2, the ScaledDense block uses various non-linear transformations. each layer of the ScaledDense block is constructed using two Asymmetric Convolution (AC) blocks [7], batch normalization [8], the Exponential Linear Unit (Elu) activation function, a Squeeze-and-Excitation (SE) block [9], and max pooling.

Loss Function

When it comes to training deep networks in brain age estimation, Two standard losses that are usually used in existing methods are the mean absolute error (MAE) loss and mean square error (MSE). These losses measure the error within individual samples, however, relationships between two or more samples are also important, including difference between two ages, and the ranking order of ages in a single batch. To measure the loss with respect to the ranking difference between two ages, the age difference loss $\begin{array}{l}\mathcal{L}_d\end{array}$ is proposed. Given $\begin{array}{l}N_p\end{array}$ number of pairs $\begin{array}{l}(i,j)\end{array}$ , for two pairs with ages $\begin{array}{l}y_i\end{array}$ and $\begin{array}{l}y_j\end{array}$ , the age difference loss is the MSE between the estimated brain age difference $\begin{array}{l}\hat{y}_i - \hat{y}_j\end{array}$ and true age difference $\begin{array}{l}y_i - y_j\end{array}$ :

$\begin{array}{l}\displaystyle \mathcal{L}_d= \frac{1}{N_p} \sum_{(i,j)}{((\hat{y}_i - \hat{y}_j)-(y_i - y_j))^2}\end{array}$

To measure the loss with respect to the rankings within a set of samples, another additional loss based on Spearman’s rank correlation coefficient (SRCC) was proposed:

$\begin{array}{l}\displaystyle \mathcal{L}_r= \sum_{i}{(Rank(\hat{y}_i) -Rank(y_i))^2}\end{array}$

where $\begin{array}{l}Rank(·)\end{array}$ is the rank operator. This rank operator, however, is not differentiable. This means that using the operator in that form is not practical, as it can not be optimized via gradient descent. One way to overcome this issue is to follow SoDeep's approach [10]. SoDeep trains a differentiable network $\begin{array}{l}R\end{array}$ as an approximation of the rank operator $\begin{array}{l}Rank(·)\end{array}$ based on synthetic ranking data. Then, the pre-trained differentiable network $\begin{array}{l}R\end{array}$ is used to replace the rank operator in the loss function, making it differentiable and practical for training.

Combining the MSE Loss with the above losses, the total loss function $\begin{array}{l}\mathcal{L}\end{array}$ used for training the network is:

$\begin{array}{l}\displaystyle \mathcal{L}= \mathcal{L}_{MSE} + \lambda_1 \mathcal{L}_d + \lambda_2 \mathcal{L}_r\end{array}$

where $\begin{array}{l}\lambda_1\end{array}$ and $\begin{array}{l}\lambda_2\end{array}$ are regularization parameters. The optimal estimation network after training satisfies $\begin{array}{l}\mathcal{L}_{MSE} = 0\end{array}$ , while $\begin{array}{l}\mathcal{L}_{d} = 0\end{array}$ and $\begin{array}{l}\mathcal{L}_{r} = 0\end{array}$ as well. Therefore, the proposed two ranking losses $\begin{array}{l}\mathcal{L}_d\end{array}$ and $\begin{array}{l}\mathcal{L}_r\end{array}$ are designed for regularizing the training process.

Experiments

The authors evaluated TSAN on a large dataset of 6586 T1-weighted MRI scans from three public datasets, namely the Open Access Series of Imaging Studies (OASIS), the Alzheimer’s Disease Neuroimaging Initiative (ADNI-1), and Predictive Analytics Competition 2019 (PAC-2019). These datasets have subjects with different age ranges.

Fig. 3:The chronological age histogram of the combined dataset.

Furthermore, three evaluation criteria were used for evaluating results of brain age estimation, including the mean absolute error (MAE), Pearson’s correlation coefficient (PCC), and Spearman’s rank correlation coefficient (SRCC).

Brain age estimation experiments

For brain age estimation experiments, only healthy subjects among the combined data were used for training. The performance of TSAN was compared with two state-of-the-art methods: a 3D convolutional neural network (CNN) [11] and a spatial fully convolutional network (SFCN) [12]. Furthermore, TSAN with the total loss $\begin{array}{l}\mathcal{L}\end{array}$ was trained three times from different random initializations, and then the ensemble of these three models (average of their estimated brain ages) is also included in the comparisons. Aside from comparing architectures, the experiments also aim to experiment the effect of using sex label as input, the effect of using ranking losses, and the effect of using linear bias correction on the data.

Dementia Classification Experiment

Brain age gap has been proved be associated with neurodegenrative disorders. In order to investigate applications of brain age gap as a biomarker in dementia classification, the authors use brain age gap as the only feature in an SVM classifier to identify healthy controls subjects, mild cognitive impairment and Alzheimer disease patients. For these experiments, only the ADNI-1 dataset was used.

Results

Architecture Comparisons

The results of the experiments show that the ensemble TSAN outperformed both CNN and SFCN in terms of mean absolute error (MAE) and Pearson's correlation coefficient (PCC).

Fig. 4: Scatter diagrams of estimated brain ages in the test data by different network models.
(A): CNN in [11] with the MSE loss. (B): The SFCN-transfer model in [12] with KL divergence and ranking losses.
(C): TSAN with the total loss $\begin{array}{l}\mathcal{L}\end{array}$ . (D): Ensemble TSAN with the total loss $\begin{array}{l}\mathcal{L}\end{array}$ .

Ranking Loss and Sex Label

The ranking loss was found to be more effective than the traditional mean square error (MSE) loss in capturing the relationship between the brain age and the chronological age. The experiments showed that the ranking loss not only improved the performance of TSAN but also enhanced the performance of existing methods such as CNN and SFCN. The experiments also show that considering sex labels in networks improves the brain age estimation results.

Bias Correction

After estimation the brain age, there is a significant age-related variance observed in the brain age gap vs. chronological age without bias correction. The authors then perform bias correction by estimating a linear model in the training dataset and removing the linear bias for the test data. After the bias correction, the correlation between brain age gap and chronological age was close to zero, which indicates that the bias correction method can successfully remove the age-related bias for brain age estimation results.

Fig. 5: Plots of brain age gap versus chronological age on test data set, without and with linear bias correction. The dashed yellow line indicates the ideal estimation reference.
(A): without bias correction. (B): with bias correction.

Dementia Classification

Brain age gap estimated by T1-weighted MRI data demonstrated its association with brainaging. As the disease progresses, the brain age becomes greater than the chronological age. In individuals with mild cognitive impairment, the mean brain age gap was approximately 3 years higher than in healthy individuals. Similarly, in patients with Alzheimer's disease, the brain age gap was approximately 7 years higher than in healthy individuals. This distinction allows for effective differentiation between AD or MCI patients and healthy subjects.

Fig. 6: Comparison of brain age gap values of HC subjects, MCI patients (yellow box) and AD patients (red box).

Furthermore, the experiments demonstrate that the proposed ensemble TSAN with the total loss and bias correction obtains the best result for dementia classification with comparison to the other models.

Conclusion

The authors introduce a novel 3D convolutional network called the two-stage-age-network (TSAN) for estimating brain age from T1-weighted MRIs. TSAN uses a two-stage cascade architecture, where the second-stage network refines the estimated age based on the discretized output of the first-stage network. To capture information from different scales, they propose a ScaledDense block in both stages, which concatenates feature maps. Additionally, the network takes both MR images and sex labels as inputs. Notably, TSAN is the first work to incorporate novel ranking losses, in addition to the traditional MSE loss, for brain age estimation. These ranking losses consider age differences among paired samples and are calculated using Spearman's rank correlation coefficient (SRCC) from a batch of training samples.

Furthermore, a bias correction is performed through linear regression to enhance the accuracy of brain age estimation and brain age gap calculation. The performance of TSAN is evaluated on 6586 MRIs from three datasets, resulting in a mean absolute error (MAE) of 2.428, a Pearson correlation coefficient (PCC) of 0.985, and an SRCC of 0.976 between the estimated brain age and chronological ages. The authors demonstrateexperimentally that the proposed ensemble TSAN with the total loss and bias correction obtains the best result in age estimation.

Finally, the authors validate the brain age estimation by utilizing brain age gap as the only input variable for SVMs to classify healthy control subjects, patients with mild cognitive impairment, and Alzheimer's disease patients. The ensemble TSAN, after bias correction, consistently demonstrates superior classification performance. These results affirm that brain age serves as a promising biomarker for dementia classification and early-stage dementia risk screening.

References

[1] López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013 Jun 6;153(6):1194-217.
[2] Dukart J, Schroeter ML, Mueller K, The Alzheimer's Disease Neuroimaging Initiative (2011) Age Correction in Dementia – Matching to a Healthy Brain. PLOS ONE 6(7): e22193. https://doi.org/10.1371/journal.pone.0022193.
[3] Longitudinal pattern of regional brain volume change differentiates normal aging from MCII. Driscoll, C. Davatzikos, Y. An, X. Wu, D. Shen, M. Kraut, S. M. Resnick Neurology Jun 2009, 72 (22) 1906-1913.
[4] K. Franke, G. Ziegler, S. Klöppel, and C. Gaser, “Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: Exploring the influence of various parameters,” NeuroImage, vol. 50,no. 3, pp. 883–892, Apr. 2010.
[5] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269.
[6] S. M. Smith, D. Vidaurre, F. Alfaro-Almagro, T. E. Nichols, and K. L. Miller, “Estimation of brain age delta from brain imaging,” NeuroImage, vol. 200, pp. 528–539, Oct. 2019.
[7] X. Ding, Y. Guo, G. Ding, and J. Han, “ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1911–1920.
[8] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 2015, arXiv:1502.03167. [Online]. Available: http://arxiv.org/abs/1502.03167
[9] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 7132–7141.
[10] M. Engilberge, L. Chevallier, P. Perez, and M. Cord, “SoDeep: A sorting deep net to learn ranking loss surrogates,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10784–10793.
[11] J. H. Cole et al., “Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker,” NeuroImage, vol. 163, pp. 115–124, Dec. 2017.
[12] H. Peng, W. Gong, C. F. Beckmann, A. Vedaldi, and S. M. Smith, “Accurate brain age prediction with lightweight deep neural networks,” Med. Image Anal., vol. 68, Feb. 2021, Art. no. 101871.

Seitenhierarchie