This is a blog post for the paper 'A robust deep neural network for denoising task-based fMRI data: An application to working memory and episodic memory' written by Zhengshi Yang, Xiaowei Zhuang, Karthik Sreenivasan, Virendra Mishra, Tim Curran, Dietmar Cordes in 2019


Introduction & Problem Definition

Today we see neural networks helping to overcome a diverse range of problems. They enable researchers in different fields of science varying from space exploration to autonomous driving to deal with problems that they were not able to deal with before using classical methods. One such area is medicine, where it is important to get the correct readings from medical devices to arrive at a sensible diagnosis for a patient. One of the biggest problems of sensory readings in such medical devices is noise. In the reviewed paper, the authors propose a way to denoise fMRI data with a DNN.


What is fMRI?

fMRI stands for Functional Magnetic Resonance Imaging. MRI is a medical imaging technique used to visualize body function, for example to take a picture of the brain activity at a timepoint. The ‘functional’ in fMRI means we are taking a video of the brain over a time span. fMRI makes use of the BOLD signal (Bianciardi et al., 2009; Caballero-Gaudes and Reynolds, 2017), which stands for Blood Oxygen Level Derivation, and is an indirect measure of the brain activity. Such as all signals, this signal has noise, which can be due to:

  • Thermal noise from the fMRI device circuits
  • Head motion
  • Cardiac & respiratory oscillations
  • Changes in blood pressure

Several methods have been proposed to denoise the fMRI signal. Nuisance regression can be used to detect non-neural signals. Using a GLM (General Linear Model) these detected signals are remove from the original signal, leaving the denoised fMRI. However, this method is not able to fully detect all noise. To overcome this, also methods using PCA and ICA techniques have been developed. CompCor (Behzadi et al., 2007) is one of the methods, that use PCA to denoise fMRI data. GLMDenoise uses PCA as well, with the difference that it requires multiple data points from a single subject, which is not as convenient. 

The DNN approach, that is proposed in the presented paper, does not try to explicitly model the noise, in contrast to previous work done in this field. This provides flexibility and better generalization. Another benefit of the model proposed here is, that it tunes its parameters for each individual subject.

Methodology

Architecture

The proposed DNN consists of 4 layers. These layers are a convolutional layer, followed by an LSTM layer (Hochreiter and Schmidhuber, 1997), time dependent fully connected layer, and finally a selection layer. fMRI data is composed of T time points, so the shape of the input to the network is NxT. In the previous work done to denoise fMRI data, high pass and low pass filters are used to cut out the noise parts of the BOLD signal. These methods need a threshold, which requires human intervention. Some studies (Boubela et al., 2013; Chen and Glover, 2015) showed, that even above the most used threshold of 0.1 (Cordes et al., 2001) still significant parts of BOLD signal could be found. Since the convolution layer learns the threshold by itself, it eradicates the need for expert intervention and it also adapts by choosing subject specific. The Conv layer has 4 filters with size 5 and a stride of 1 each. The output of the conv layer is fed to an LSTM block. 

A Long Short-Term Memory is a type of Recurrent Neural Net (Bengio et al.,1994), which generally work well for sequential data. In the usecase of text processing, RNNs are able to express the dependency between the word and its neighbours. In addition to the standard units of RNNs, LSTMs have a memory unit, which helps them to model long-term temporal dependencies and makes them more powerful. The fMRI dataset used in this paper includes time series representing the BOLD signal for each voxel. BOLD signal has temporal auto correlations, which means that a data point at a given time depends on previous time points. Since LSTMs are good at representing these dependencies and perform well on time series data, they are used as the second layer of the architecture.

Since fMRI data has a serial structure and we don’t want to loose that information, a time distributed fully connected layer is used next. A normal fully connected layer would ignore the sequentiality of the data and feed the whole data to the hidden nodes at the same time. A time distributed fully connected layer, on the other hand, is a fully connected layer, where each time point in the data is fed to the same network. This enables the network to keep the time information. There are K hidden nodes in this layer. The output of the layer has the format NxTxK, where N is the number of samples, T is the number of time points, and K is the number of time series. It filters K different possible time series’ data points, which are possibly the denoised fMRI data. The final layer is a selection layer, which uses the correlation of the data to the task design matrix with the voxel data to determine which time series is related most closely to the neural response signal. The resulting time series data is accepted to be the denoised fMRI data. 




Architecture of the model

Special Loss Function

Loss functions are the evaluators of a DNN. After one iteration, the loss value determines how much and in which direction the gradients should change. Loss values depend on the ground truth values as, in the end, a DNN wants to converge there. However, in the case of fMRI data, it is hard to create a loss function in the usual way, because the ground truth, in other words how the real BOLD signal should look like, is not known. The authors made use of the following fact to create a special loss function: fMRI image segments the brain into 3 areas. 

  • Gray Matter
  • White Matter
  • CSF (Cerebrospinal Fluid)


It is expected that Non Gray Matter voxels have no neural response. Given that, if there is a similarity between a given time series data and the Gray Matter data we can say that the given time series data is more likely to be actual neural response signal instead of noise. On the other hand, if it correlates more with the data from Non Gray Matter areas, which are White Matter and CSF, we say that it is more likely to be noise. Mathematical representation of this loss is the following where Y~  represents the data from Non GM areas, Y represents the GM data and X represents the task design matrix that is obtained with GLM: 


In short the loss function tries to maximize the correlation difference between voxels in GM and Non Gray Matter. 

Simulation

To test the previous work and the proposed DNN, data with ground truth is needed. Main idea is to add the task based fMRI data to resting state fMRI data. Following are the steps for the simulation

  • Decide on the active regions of the brain in the simulation data. (This is basically the step to decide on the ground truth.)
  • Convolve the binary task with uni HRF and var HRF (Hemodynamic Response Function) to get the final task based fMRI like signal. 
  • Add the created signal to the resting state signal if a given voxel is decided to be active, don’t add it otherwise.
  • Finally there are two types of signal, where we know the active regions. 
    • uniHRF simulation data: Where there is a single response function.
    • varHRF simulation data: Where the response function varies.

Training Setup

For training, they used Xavier Initialization (Glorot and Bengio, 2010). It helps the network start from a nice point in search, so the gradients won’t vanish. Optimizer choice was the popular Adam (Kingma and Ba, 2014) optimizer, which utilizes the power of first moment and second moment in the search for the global minimum. The data was split with the ratio of 90% training, 10% validation. Full training takes 30 epochs for the DNN and one epoch means traversal of all the data points once. A single batch includes 500 samples.

Results

Several methods are tested with the simulated data and the real data and the results were compared to prove the DNN’s capabilities.  FIX is a method that makes use of ICA to detect the components of the signal and denoise it. FIX, FIX + Temporal Filtering (TF) (Chai et al., 2012)., DNN, FIX + DNN are the tested methods. 

Simulated Data Results

For the evaluation metric, area under the ROC curve is used. AUC has been calculated for the areas, where false positive rate is smaller than 0.1 and taken as a metric for the evaluation. The higher the %AUC the better the model. Results are as follows. 



%AUC

Raw

FIX

FIX + TF

DNN

FIX + DNN

uniHRF data

62.4

67.8

72.4

69.2

75.2

varHRF data

58.2

63.3

68.2

67.8

72.5



As we can see from the table above, even though vanilla DNN is worse at denoising the fMRI data than FIX+TF, it is superior when combined with FIX. 

In the figure below six different raw signals are shown from six different regions of the brain. The binary task is known for each region, it can be seen at the top of the figure. On one gray square voxels change on the y axis and time changes on x axis. As can be seen from the image, FIX+DNN is superior in removing the noise. The lines can easily be seen in the denoised image.

Again in the figure below, we see that the area under the curve is the biggest, when FIX and DNN are applied together. 


Another metric to evaluate the methods, is to look at the correlation differences of active and inactive voxels with the task design matrix. This shows how denoised the data is. As we can see in the below figure, FIX+DNN has the biggest correlation difference.

Real Data Results

The same methods are also applied to the real data. Different than the simulated data, there is no ground truth for the real data. The hypothesis was that the voxels that have less correlation average to the design matrix are more likely to be inactive. The voxels are ordered according to their correlation with the design matrix. Top 90 percent has been selected to be active, and the remaining 10 percent is selected to be inactive. 

Again, FIX+DNN has the biggest mean correlation difference, which makes it a better method than others. 

Discussion

According to both simulated and real data, it is shown that the DNN is better than others and most effective, when applied together with FIX. This was the first deep neural network approach to denoising the fMRI data. The better performance of the DNN is achieved thanks to its convolutional and LSTM layer. ICA-based FIX combined with Temporal Filtering has performed better than only the vanilla DNN. FIX is better at modeling and extracting the structural noise, while DNN detects the variational, random noise. So, in combination with FIX, DNN had the best performance. 

Students View

The paper was well written. As for the method explained and the approach some minor things could be improved upon, in my opinion. Parameters of the DNN were chosen heuristically without optimization. I believe that applying a grid search might have resulted in a better set of parameters for the model and hence better results.  The method proposed can unfortunately not be applied to resting state fMRI data, as there is not a task design matrix. The architecture is not modular and heavily dependent on the type of the data. On the other hand, I find several aspects of the paper successful. The need for an expert intervention is not necessary as the method adapts itself to the data. What most fascinated me about the approach taken in the paper, was the idea to use the correlation difference for the loss function. To sum up, the performance speaks for itself: It works pretty well in combination with FIX.


References

Yashar Behzadi, Khaled Restom, Joy Liau, Thomas T. Liu, A component based noise correction method (CompCor) for BOLD and perfusion based fMRI,NeuroImage,Volume 37, Issue 1,2007

Salimi-Khorshidi, G., Douaud, G., Beckmann, C.F., Glasser, M.F., Griffanti, L., Smith, S.M., 2014. Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers.

Kasper, L., Bollmann, S., Diaconescu, A.O., Hutton, C., Heinzle, J., Iglesias, S.,Hauser, T.U., Sebold, M., Manjaly, Z.-M., Pruessmann, K.P., 2017. The PhysIO toolbox for modeling physiological noise in fMRI data. J. Neurosci. Methods 276, 56–72

Bianciardi, M., Fukunaga, M., van Gelderen, P., Horovitz, S.G., de Zwart, J.A.,Shmueli, K., Duyn, J.H., 2009. Sources of functional magnetic resonance imaging signal fluctuations in the human brain at rest: a 7 T study. Magn. Reson. Imaging 27, 1019–1029.

Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9, 1735–1780.

Boubela, R.N., Kalcher, K., Huf, W., Kronnerwetter, C., Filzmoser, P., Moser, E., 2013. Beyond noise: using temporal ICA to extract meaningful information from high- -frequency fMRI signal fluctuations during rest. Front. Hum. Neurosci. 7, 168.

Chai, X.J., Castañón, A.N., Öngür, D., Whitfield-Gabrieli, S., 2012. Anticorrelations in resting state networks without global signal regression. Neuroimage 59, 1420–1428

https://kendrickkay.net/GLMdenoise/













  • Keine Stichwörter

Kommentar

  1. Unbekannter Benutzer (ga65rot) sagt:

    With regard to the selection level: Did the authors go on to explain how they estimated the optimal design matrices from the set of potential design matrices?

    And concerning FIX: Do I understand correctly that, also making use of an LSTM layer and a time-distributed fully-connected layer, the DNN is best at removing spatially specific structured noise, while FIX is actually accountable for removing global structured noise?

    Does FIX need expert intervention?