Based on: Qiang, N., Dong, Q., Ge, F., Liang, H., Ge, B., Zhang, S., ... & Liu, T. (2020). Deep variational autoencoder for mapping functional brain networks. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 841-852.

The blog post is written by: Ekaterina Semenova

Introduction

Many of us have heard that specific brain regions are activated when performing certain tasks. You might imagine it like this: when you move your right arm, a particular part of the brain in the left hemisphere activates - and you would be indeed right. Numerous brain regions are engaged in different activities. Some regions are responsible for processing visual or auditory information, while others are active when you are resting. These regions are known in the scientific community as Functional Brain Networks (FBNs).

However, we still lack a golden standard for these FBNs due to the complexity of the human brain. Each person's brain is unique, with variations in structure and function influenced by genetics, experience, and environment. Moreover, FBNs can change over time within the same individual due to the brain's plasticity and adaptability. Understanding FBNs can be incredibly useful for diagnosing neurological and psychiatric disorders like ADHD, autism, schizophrenia, and depression. Additionally, this knowledge can help in creating Brain-Computer Interfaces (BCI) that enable direct communication between the brain and external devices, which can be revolutionary for people with disabilities and allow them to control prosthetics or communication devices.

In this blog post, we will explore a generative machine learning approach for mapping FBNs: Deep Variational Autoencoder (DVAE). We will delve into understanding this novel technique and see how it can help identify FBNs from fMRI scans. I hope you are as excited about this topic as I am and are ready to dive DEEPer!

Exploring the Concept of Deep Variational Autoencoder (DVAE)

As we have already mentioned, DVAE is a generative machine learning approach which means that it creates new data that looks like patterns from a dataset. Imagine drawing a house: each drawing is unique but can still be recognized as a house. This is what a generative approach does - it creates new, yet similar, data. Now, let's explore what is inside a DVAE.

Think back to drawing a house. What did you think about before drawing it? Most probably about some features which most houses have - windows and an entrance door. You may imagine it with a balcony or a chimney. It probably was light-colored. These are the features, or latent variables (let's call this dataset Z), that we have learned from all the houses we have ever seen in our lives (dataset X). Based on these features, we can easily draw another house. This is exactly what an autoencoder does! The encoder part learns to extract latent variables Z from the dataset X, and the decoder can reproduce the input dataset using the encoder's algorithm in reverse. Of course, we are not particularly interested in recreating the exact same dataset we provided as input, but this approach allows us to train our encoder to produce the best possible features by minimizing the error (the difference) between the input X and output X.

We know that almost all houses have structural features like windows and entrance doors, while not all have balconies or chimneys. Most houses are light-colored and roughly rectangular, but there are of course exceptions. So, some features can be present or not, and this introduces the probability space of our latent variables, which is what "variation" in DVAE stands for! The encoder of a DVAE learns the probabilistic distributions of these features, not the exact details. This makes a lot of sense, right? If our model learns the exact features of this small dataset, it will fail to generalize. It is especially helpful when having small datasets. For example, if our training dataset consists of only two houses, the white one and the blue one, a model with hard features might learn that all houses must be either white or blue. When a house is green, such a model would struggle to recognize it as a house. However, by learning probabilistic distributions, the model understands that houses can have various colors with certain probabilities. This way, it can generalize better and recognize a green house as a valid variation.

Let's take a look at the basic mathematics terms behind the generative process. In Figure 1, you may have noticed q(Z|X) under Encoder and p(X|Z) for Decoder. This reflects what we have just discussed: first, we learn the posterior probability distribution of the latent variable Z given the input X - q(Z∣X). Once we have the distribution of our latent variable p(Z), we then sample p(X∣Z), which represents the likelihood of generating the data X given the latent variable Z.

As mentioned earlier, our goal in training the DVAE is to minimize the reconstruction error, which is the difference between the input data and the output data. You can think of it like this: you have drawn a beautiful house and, because it is such a nice drawing, you describe it to your friends over the phone so they can draw it too. Later, when you meet, you compare your drawings. To measure how similar they are, you calculate the mean squared difference for each part of the drawing. Mathematically, this is described as follows:

$\begin{array}{l}\displaystyle \text{Reconstruction Error} = \frac{1}{n} \sum_{k=1}^n (X_k - \underline{X}_k)^2\end{array}$

In this formula, X_k represents a signal sample of the input, where k is in the range from 1 to n, with n being the total number of samples in the input data. X_k here denotes the corresponding reconstructed sample.

To understand what is inside the encoder and decoder, let's move to the next section. There, we will explore how the DVAE concept can solve real-world problems, not just creating pictures of houses (which is of course also nice).

Application Example: DVAE in Mapping Functional Brain Networks

Now we know how DVAE works in general. It is time to take a closer look at how it can be implemented for reconstructing FBNs from fMRI data and go step by step through this reconstruction process.

Step 1: Data Preprocessing

First, we need to preprocess our data - fMRI scans. fMRI scans have 3 spatial dimensions and a temporal dimension. fMRI scans are taken over time to capture changes in brain activity, so each voxel's signal varies across time points. Alongside some signal preprocessing procedures such as motion correction, band filtering, normalization, masking and others (there are a lot of tools that handle that!), we get our 4D fMRI data converted to the 2D matrix, as you can see on the picture below. Here, t is the number of time points over which each single scans were made and n shows the number of voxels - 3D pixels used in medical imaging to represent a small volume in each subject's scan at a single time point. T is the total time of all subject's data equal to t * N.

Once we have the input data in a suitable form, we can feed it into our DVAE.

Step 2: DVAE Training

You can see the architecture of teh DVAE we will use in the picture below (Figure 3). It has an input layer, an output layer, and 5 hidden layers. The number below each layer indicates the number of nodes it has. This multi-layered architecture gives the name "deep" to our DVAE and is used for capturing complex relationships and dependencies within the data.

For our posterior probability distribution q(Z|X), we assume that it is of a form of a normal distribution. Each sample from our input data X, x_k where k is again in {1, 2, ..., n}, has its own normal distribution with the mean of μ_k and variance of σ_k². This can be expressed as follows:

$\begin{array}{l}\displaystyle \log q(Z|X) = \log N(Z;μ, σ^2I)\end{array}$

During the training, the encoder aims to learn these hyperparameters - distribution parameters, μ_kand σ_k, for each input data sample x, as well as encoder weights, while minimizing the error. In our case, this error has two components: the reconstruction error, which was explained previously, and an additional term known as KL divergence. KL divergence actually just measures the difference between two probability distributions. Thus, minimizing the overall error involves reducing this difference. In other words, during the training, we will push our posterior distribution q(Z|x) towards a standard normal distribution, N(0,1). This helps not to consider outlines and improve the model's generalization capabilities.

Okay, let's consider that we have trained our encoder well, resulting in good latent variables. What comes next?

Step 3: Reconstruction of the Coefficient Matrix

In a standard autoencoder, we can directly use an encoder weight matrix learned during training. This matrix contains information about spatial relationships and features within the data, allowing us to construct so-called spatial maps from fMRI data. These spatial maps represent the distribution of brain activity and highlight regions that are functionally connected, essentially representing our FBNs - which is exactly what interests us!

But with the DVAE, as we already learned, we learn the distribution of latent variables, not the variables themselves, so we cannot directly use its weight matrix for reconstructing FBNs. To solve this problem, we use lasso regression to create a sparse coefficient matrix which we can use instead of the encoder weight matrix further. Lasso regression helps us find a simpler model by retaining only the most important features, making the coefficient matrix sparse (many coefficients are zero), which highlights only the key connections. With that, we reconstruct the coefficient matrix as follows:

$\begin{array}{l}\displaystyle \min_B \left( \frac{1}{2T} \|Z - XB\|_2^2 + \lambda \|B\|_1 \right)\end{array}$

After performing this operation, we obtain a matrix B in a format similar to our encoder's weight matrix, so we transpose it to proceed with the estimation of spatial maps described in the next step (see the picture below). Here, d - dimension (or number) of our latent variables (in our case - 80).

Now, we have the sparse coefficient matrix which all learned key features of our data. One step left from our goal!

Step 4: Back to 3D Space

As you can see in Figure 4, the number of rows of the matrix B corresponds to the number of latent variables d. So, we can take each row of B and map it back to the original 3D brain image space using the initial masking process used in data preprocessing (provided by the external tool). Finally, this process resulted in the creation and visualization of 80 FBNs!

Results

To validate our method, we conduct a proof of concept by comparing FBNs derived from our DVAE with those obtained from Sparse Dictionary Learning (SDL) using the same dataset. SDL is a widely used technique in machine learning and signal processing that seeks a sparse representation of data (remember our discussion on Lasso regression and sparse coefficient matrices?).

As we said at the very beginning of our journey, there are no golden standards for FBNs. However, by benchmarking against SDL - a reliable (still not perfect) method - we can assess the reliability of our DVAE-based approach in mapping FBNs. To measure the differences in FBNs derived by these methods overlap rate (OR) was used. It is defined as follows:

$\begin{array}{l}\displaystyle \text{OR}_{\text{FBN(DVAE), FBN(SDL)}} = \frac{\sum_{i=1}^{n} |N(1)_i \cap N(2)_i|}{\sum_{i=1}^{n} |N(1)_i \cup N(2)_i|}\end{array}$

If voxel i is active in the corresponding FBN, then N⁽¹⁾_i = 1; otherwise, N⁽¹⁾_i = 0. An OR closer to 1 indicates greater similarity between its inputs.

The average of the 80 highest ORs for all FBNs across the two methods is 0.313, and for the 30 highest FBNs, it is 0.4. This might not seem impressive at first glance, but keep in mind that SDL-derived FBNs are not the standard. The variability could be where DVAE outperforms SDL!

To verify this, we compare both SDL and DVAE-derived FBNs with Resting-State Networks (RSN) templates. RSN templates are standardized FBNs for when the subject is at rest and not performing any specific task. They are derived using classical statistical and signal-processing techniques and, while not perfect for every individual, they are widely used.

We compare the pairwise differences between SDL-derived FBNs and RSNs as well as DVAE-derived FBNs and RSNs using our OR metric. The results show that DVAE is actually outperforming SDL in 8 out of 10 FBNs! This shows that DVAE-derived FBNs align more closely with RSN templates, highlighting the potential of DVAE to better capture the underlying neural patterns in fMRI data.

Conclusion

With these results, we can see that using DVAE is a valid and promising approach for mapping FBNs.

Looking ahead, the multi-layered architecture of DVAE offers exciting possibilities for further research. For example, it could be used to derive a hierarchy of FBNs: higher-level latent variables could identify more general FBNs (such as those responsible for a movement in general), while lower layers could pinpoint more specific FBNs (like "writing with the right hand"). This hierarchical approach could significantly enhance our understanding of brain function! Additionally, DVAE's potential extends to the diagnosis of neurological and psychiatric disorders, as well as improving BCIs. These applications could lead to better diagnostic tools and more effective assistive technologies.

I hope you enjoyed reading this exploration of DVAE and its applications in neuroscience. Until next time, and ciao-ciao!

Chat-GPT Promts

Correct grammar mistakes in the following text: ...
Rewrite this equation in the LaTeX format: ...
Explain better this part of the paper: ...
Define ... and explain it with simpler language.
What I can add to the text ... to make it sound better?
Go deeper in the explanation of the topic ...
Replace ... with another word.

References

Qiang, N., Dong, Q., Ge, F., Liang, H., Ge, B., Zhang, S., ... & Liu, T. (2020). Deep variational autoencoder for mapping functional brain networks. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 841-852.
Lv, Jinglei, et al. "Sparse representation of whole-brain fMRI signals for identification of functional networks." Medical image analysis 20.1 (2015): 112-134.
Blog-Post about Variational Autoencoders by Jakub Tomczak [https://jmtomczak.github.io/blog/4/4_VAE.html]
Online "Learning Book in Machine Learning" by School of Data Analysis [https://education.yandex.ru/handbook/ml]
fMRI Picture used in Figure 2 from the website: https://kryptonite.global/blogs/difference-between-mri-fmri/

Seitenhierarchie

15: Deep Variational Autoencoder for Mapping Functional Brain Networks