Nicola K. Dinsdale, Mark Jenkinson, Ana I.L. Nambureted
Blog post written by: Efe Berk Ergüleç
Abstract
In recent days, there exist various neuroimaging datasets on the internet. Due to the fact that using more data will increase the accuracy, most of the people think that using various neuroimaging datasets together will directly increase the accuracy. However, this situation also reduces the accuracy because image-based datasets have different features. This blog post will guide you to comprehend the solution for the data harmonization and confound removal problems.
Motivation
There exist many different multi-site multi-scanner MRI neuroimaging datasets to become available for usage. It is essential to combine the data from the various scanners to increase statistical power. Combining MRI datasets for research purposes will help to solve many different problems, including neurological conditions and disease detection. On the other side, most of these datasets does not show similarity due to various conditions, such as different devices, different resolutions, etc. In order to reduce the impact of scanner-based errors, the article proposed a deep learning-based training strategy based on domain adaptation techniques that employs an iterative update approach to create features that are scanner-invariant while maintaining performance on the primary task of interest. [1]
The main problem about combining datasets is the increase in variance and bias in the data. When people looked through the reasons why it has happened, they have confirmed that these problems due occur due to scanner-based variables. [2]
Therefore, many neuroimaging studies require the removal of this scanner-induced variance.
Contributions
The article covers comprehensive solution to this algorithm, and even if provides the individual to focus on their primary task. The first thing to do is to define the two main concepts of this research: data harmonization and confound removal.
Data Harmonization: Combine data from different domains by preparing a model that removes specific features and keeps the universal variables.
Confound Removal: Removing features that are not related with the main task. Also, it could be classified as features connected to scanner.
For our case, data harmonization aims to remove the scanner-induced variance and extract the variables related with biology. There are common methods to apply the method, but this research has used domain adaptation to harmonize the data. Domain adaptation presents a good perspective to the solution. It simply gets the source domain (DS) with source labels (TS) and applies an iterative operation to bring the data as close as the size of the target domain (DT). The difference between source and target domains is called the domain shift. On the other side, domain adaptation aims to preserve the task of finding a feature representation that is irrelevant from domain and relevant to the task of interest. The article uses one of the adversarial methods called Domain Adversarial Neural Network [3].
Domain Adversarial Neural Network structure is defined on Figure 1. There are 3 main parts on the structure: feature extractor, label predictor and domain predictor. Feature extractor (green part) creates a feature representation. As previously mentioned, we should eliminate the features that are scanner-based. Label predictor (blue part) is responsible to handle on the main task. For this paper, the main title would be either a segmentation or regression task that is connected to brain. Lastly, domain classifies (pink part) is getting data from feature extractor, estimates the confound, optimizes the data and uses backpropogation to send the data back to feature extractor. After multiple iterations, the network minimizes the loss on label predictor and maximizes the loss on the domain classifier.
Figure 1: Domain Adversarial Neural Networks (DANN) [3]
This paper aims to make multiple frame harmonization by using unknown amount of datasets and creates features that is applicable to any of these during the model testing. Extracted features should be irrelevant from domain (which is scanner) and relevant for the task. Lastly, during the procedure, protection of the performance on the task of interest is also crucial.
Methodology & Core Results
Due to the complexity of the paper, this section is divided into two parts, just like described on the paper: Age Prediction Task and Segmentation Task.
Age Prediction Task
The first purpose of the paper is to provide solution for one of the most common regression problems called as the Age Prediction Task. Overall architecture is displayed on Figure 2.
Figure 2: Regression architecture of the paper. [1]
There are three main steps to train the architecture on Figure 2:
- Optimizing the feature extractor and the label predictor for the main task.
- Optimizing the domain classifier to identify the scanner information remaining.
- Optimizing the feature extractor to confuse the domain predictor and remove scanner information.
1. Optimizing the feature extractor and the label predictor for the main task.
In this step, we simply ignore the domain classifier, shown at Figure 2.1. We only train the encoder and label predictor. This step is acting identical as a usual CNN. Lp represents the multiple purpose loss function calculated on the first step.
Figure 2.1: First step [1]
2. Optimizing the domain classifier to identify the scanner information remaining
After conducting the usual training procedure, the network freezes the whole main task and only concentrates on domain classifier, available on Figure 2.2. In domain classifier, categorical cross-entropy is applied towards the data came from feature extractor to eliminate scanner-based features. Ld, domain loss function, maximizes the label predictor. Thus, data could lean towards adopting the task-based features while removing scanner-induced variance.
Figure 2.2: Second step [1]
3. Optimizing the feature extractor to confuse the domain predictor and remove scanner information
Last step of the training procedures consists of optimizing the feature extractor by freezing both the main task and the domain classifier (Figure 2.3). It is seen that on the previous step, domain classifier has maximized the features, which will help confusion loss (Lconf) to minimize the features by getting the output of it as an input source and simply eliminate the redundant features.
Figure 2.3: Third step [1]
In overall, the architecture has the loss function mentioned on Figure 2.4. α and ß represents the weights of the loss functions, which will help the model to minimize with respect to the task. The weights aren’t constant because the architecture is designed to adapt any kind of task.
Figure 2.4: Architecture with the loss function [1]
The architecture also gives opportunity for the user to eliminate other categorical confounds by providing an additional input image for confounding (Xc), shown on Figure 2.5.
Figure 2.5: Network with the input images for confounding. [1]
Results
In overall, the purpose of combining datasets worked pretty well. As it can be seen from Table 1, there are three different cases. For each condition, putting all datasets together gave the best performance. In addition to Table 1, Table 2 also shows the effects of applying confounding removal to our dataset. It is clearly seen that applying Unlearning is reducing the Mean Absolute Error (MAE).
Table 1: Results comparing unlearning to training the network in different combinations on the datasets. Mean absolute error is reported in years. Scanner accuracy is the accuracy achieved by a domain classifier given the fixed feature representation at convergence, evaluating only for the datasets the network was trained on. Number in brackets indicates random chance. B = Biobank, O = OASIS, W = Whitehall. p values can be found in the supplementary material. Bold indicates the experiment with the best average across the datasets. [1]
Table 2: Results – Effect of Unlearning [1]
Segmentation Task
The second task that article proposed solution is the segmentation task. As it is seen from Figure 3, the architecture for segmentation is almost identical with regression tasks. Only difference is the usage U-Net architecture on feature architecture. There is a complete unlearning on bottleneck (B). The reason lying behind it is the fact that all scanner information is available to be re-learned during upsampling. To prevent such case, network gets the data from bottleneck and concatenate it with the upsampled data before entering domain classifier.
Figure 3: Architecture for the segmentation task [1]
Results
Segmentation algorithms measure the success rate with Dice score, rather than using MAE. Discussing the results, Table 3 shows the efficiency of both using bottleneck and the final convolution together. However, Table 4 shows that applying unlearning for segmentation actually doesn't increase accuracy way too much. Lastly, Figure 4 graphs the results and supports the argument that unlearning still creates a difference but the change isn't too much.
Table 3: Dice scores comparing different locations for attaching the domain classifier in the network [1]
Table 4: Dice scores comparing unlearning to training the network on different combinations of the datasets, averaged across the tissues types. [1]
Figure 4: Dice scores for the two datasets for each method broken down by tissue type. CSF = Cerebrospinal fluid Fluid, WM = White Matter, GM = Grey Matter [1]
Discussion
Proposed solution has many advantages and disadvantages. The most important benefit of this solution is that it’s nearly applicable to all medical imaging problems. Additionally, separating loss functions and the training procedure, and putting them in an iterative approach helps the system to stabilize itself on long term. Unlearning of scanner-based features leads to increase on performance on accuracy. Lastly, this system is suitable for any kind of neuroimaging datasets, it is also encouraged to add new datasets to the neural network. As it’s seen from Table 1, it also gives great results.
On the other side, there exist some weaknesses on the system too. First of all, even if the performance is good during testing phase, harmonization actually recedes the performance, due to the heavy load of the operations. Secondly, each medical task requires separate harmonization training. In other words, it is not suitable for executing multiple tasks simultaneously. Additionally, the architecture could some necessary information if there is no overlap between datasets' distributions. As an example, if one splits one dataset onto two and assigns them onto the network as two different datasets, network will remove requisite data too.
Review and Personal Comments
In my personal perspective, this article presents a tremendous method to optimize various datasets. The article presents the problem from very unique perspective. Its application area is very comprehensive, but also efficient based on performance. Proposed method is working for MRI harmonization and confound removal. Results supports the efficiency of the network based on classifying and/or identifying the issue. It is clearly working for various neuroimaging tasks and data scenarios. In addition to its flexibility structure, the method is easily applied to many different forms of problems like segmentation, classification and regression.
The article presents different solutions for both tasks. In my sense, their concentration on details is very interesting. For example, increasing the accuracy by concatenating the bottleneck and final convolution for the segmentation task is a notable and effective shortcut. Additionally, creating flexibility for adding additional image confounds for the removal for other categorical confounds for regression task was a very cool feature for me. Lastly, a further improvement on implementing multiple harmonization cases during concentrating on main task would be a good space to continue for this work. To sum up, this article covers a comprehensive solution to almost every medical imaging problem.
Bibliography
[1] | N. K. Dinsdale, M. Jenkinson and A. I. Namburete, “Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal,” NeuroImage, 2021. |
[2] | H. X., J. J., S. D., K. A., Q. B., C. S., B. E., P. J., A. M., K. R., M. R., R. H., M. N., D. A., D. B. and F. B., "Reliability of mri-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer.," Neuroimage, vol. 32, pp. 180-194, 2006. |
[3] | G. Yaroslav, U. Evgeniya, A. Hana, G. Pascal, L. Hugo, L. François, M. Mario and L. Victor, "Domain-Adversarial Training of Neural Networks," 2015. |












