Physics Inspired Neural Networks in Ultrasound Imaging

Supervisor: Miruna-Alexandra Gafencu

1. Motivation

Medical imaging, a method to treat and image internal structures of the body, has due to its non-invasive nature became a really important technique in medicine to treat diseases like brain disorders, Alzheimer or Parkinson[14,15]. When using medical imaging to treat a disease e.g. with transcranial ultrasound imaging, where we can apply energy to certain parts of the brain to remove harming cells. It becomes clear that it is important to increase the energy only at the parts in the brain where the disease is located and not in healthy parts. So simulating how the the energy differs in different areas in the brain with the given configurations, is essential for the sucess of the treatment. To calculate the energy map, we can use so called partial derivative equations (PDE's), which define how e.g. a wave propagates through a medium. Nevertheless, solving these equations with traditional numerical methods is computationally expensive and has some serious drawbacks (section 3.2). Luckily, these derivatives can also be solved via an algorithm called backpropagation. This algorithm is mainly used in neural networks which leads us to the topic of this blog: Physics inspired neural networks in medical imaging.

Physics inspired neural networks (PINN), neural networks to predict propagation by adding PDE's and boundaries to the loss term, are the uprising star in areas where we have prior knowledge in the form of PDEs and sparse amount of data. They surpass traditional data-driven neural networks performances when only a low amount of training data is available. Having an reliable wavefield for a given PDE and only need a low amount of data, makes it a perfect fit for medical imaging, which is an area where data acquisation is expensive and we can use prior knowledge in form of PDEs and boundary constraints, which lets us train accurate PINNs. To explore the impact on medical imaging by PINNs more deeply, this blog will answer the questions: why to use PINNs (section 3), how to use PINNs (section 4) and if they can fulfill their expectations (section 5). Since even for promising technology we need to be demanding, we critically discuss the topic in section 6.

2. Background

This section shortly defines concepts which are needed to understand the following blog post. If you have sufficient background on PINNs and PDEs just skip to section 3.

2.1. Physics inspired neural networks

Physics informed neural networks were first introduced by Raissi et. al [7] in 2019. Instead of treating the neural network as a blackbox like old physics informed machine learning approaches. They used the advantages of backpropagation to induce the partial derivative equation directly by adding it to the loss term. This invention opened to possibility to introduce deep learning approaches on a context where data acquisition is still expensive.

Classical deep learning approaches need a huge amount of data to perform the best [16]. Introducing prior knowledge via the partial derivative equations (PDE) lets us reduce the solution space to a minimum and we now can find robust solutions with less amount of data. The main three reasons to use PINNs are:

Sparse data and expensive to acquire
Problem bound to known physical laws
Need of generalization beyond training data

Medical imaging especially noninvasive forms like ultrasound are well suited for this problem. We often do not have a large amount of data available [17] and acquiring new data is expensive. We have known physical laws which they must obey, which can be written as a partial derivative equation, like the wave-equation or the navier-stoke equation. Originally, so-called mesh-based numerical wave solvers, have high computation cost and a high discretization error, which limits them in their ability to properly compute the partial derivative equations. In these cases, a PINN can lead to a high accuracy with less needed computationally costs. Even further, a PINN can be understood as an unsupervised learning technique and we do not need labeled data, which especially in the area of medical imaging would be very expensive.

2.2. Mathematical Background

2.2.1. PDE

A partial derivative equation is a unknown function with one or more variables and some of its partial derivatives. We can define it mathematically as:

$\begin{array}{l}\displaystyle F(D^Ku(x), D^{K-1}u(x),..,Du(x),u(x),x) = 0\end{array}$

We now can solve this equation by finding all u which make sure that equation above holds. Differential equations exist in the forms: linear, semilinear, quasilinear and nonlinear. We will focus in the later sections on non-linear and linear differential equations. A equation is called non-linear, if it depends non-linearly on the highest order derivative. [2]

In medical imaging one of the mainly used PDE's is the wave equation:

$\begin{array}{l}\displaystyle \frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u\end{array}$

The equation above is only for one spatial dimension. For background explanations this is sufficient. A highly needed property this equation gives us, is that with a good initial guess and appropriate boundary conditions, it has a unique and stable solution. To solve the problem by using ultrasound tomography (UT) and solving the inverse problem of the UT by using the so-called wave equation and solving its inverse problem [1]

2.2.2. MESH

Dividing the domain to be analyzed into small regular shaped regions. [18]

3. Medical Imaging and PINNs

In this section we will look why we can use PINNs in medical imaging and what are there advantages and disadvantages especially in the context of old-used methods.

3.1. Why use PINNs in Medical Imaging

To check if a PINN suites our problem, we can compare the problems of ultrasound imaging against our three main reason presented in the Introduction to use a PINN. First, sparse data and expensive to acquire. This is the case for ultrasound imaging. Second, problem bound to physical laws, which is clear since we can use e.g. the wave equation to properly simulate blood flow through a skull. Third, need of generalization beyond training data. Since generalization is really important for all types of problems.

Data acquisation can take up weeks
We can use wave equation to simulate blood flow
Good understanding needed on new data

3.2. PINN vs. Numerical Methods

Before the uprise of neural networks and deep learning there were other methods to solve PDEs. One of these old methods was the to solve the finite element method(FEM) which was used by Vafeian et. al in [10] to solve the wave equation. One problem with this method was the amount of effort which was needed to create the mesh to solve the PDE. In comparison to these traditional approaches which were mesh-based a PINN is more flexible to be applied to different form of wave equations and different type of boundary constraints. Additionally, traditional numerical methods suffer from so-called numerical dispersion artefacts, which is a disadvantage when using the FEM method. Song et. al [9] showed in 2021, that this numerical problem doesn't occur when using PINNs. Of course using a PINN has some drawbacks, one is that we have a non-convex optimization problem, so we can never be sure to reach the optimum. In contrast the traditional numerical solvers who always gets a unique solution [5]. The drawbacks of using PINNs are obviously similar to the drawbacks of neural networks in general. Especially, finding good hyperparameters can be expensive and may need multiple iterations. The stated facts are visualized in the table below, with green meaning an advantage and red a disadvantage. Even though PINNs have some drawbacks the advantages outweigh these drawbacks, since the unique solution and hyperparameter problem can in most cases be dismissed, especially finding a good solution is achievable, which we will later see in section 5.

	Compuational Cost	Flexibel	Numerical Problems	Unique Solution	Hyperparameter search
PINN	medium	yes	no	no	yes
Numerical Method	high	no	yes	yes	no

3.3. PINN vs. Data-Driven Approaches

One could still think it is sufficient to just use a deep neural network and try to learn against recorded data. Though, for a traditional neural network learning process a high amount of data is needed, which is in areas like medical imaging not the case. So the traditional approaches will suffer accuracy when being trained the "normal" way. To backup this statement with actual results, He et. al [3] could directly show that when we have sparse data, the PINN approach significantly outperforms traditional data-driven approaches.

4. Methodology

In this section we will shortly look into the methodology used by the three papers [4, 12, 8]. Methodology, in the case of training a PINN can be understood as which PDE do we use in the loss term and how is our neural network build. We also look at the training process, whcih is for all looked at PINNs fairly similar, we input spatial coordinates with a time stamp. The time stamps can be sampled randomly, since we want to generally solve the PDEs and not only for special cases. The output then be the wavefield at this spatial coordinate for the given time. After we finished training our model, we just compare the outputed wavefield against ground truth data and look how much they differ from each other.

4.1. Goals

The three studies have different goals in what they want to achieve.

Wang et. al

Predict transcranial ultrasound wave propagation to treat brain disorders.

Roy et. al

Predict pulse wave equation to diagnose arterial stiffness.

Liang et. al:

Predict the blood flow velocity and pressure fields, which can be used to treat and diagnose vascular diseases.

4.2. Training Process

The training process is similar than for a normal neural network. It was shortly describe above, like in other neural networks, we try to mimnimize the loss function until it gets under a certain treshold or we reach an specified amount of iterations. In the following Table, we can see which hyperparameters the networks we will analyze use. All of the networks use Adam as optimizer and use fairly similar learning rates. However, the loss functions differ and Liang et. al use Swish as activation function which is defined as x * Sigmoid. They use it because Dung et. al showed in [13] that for PINNs the highest accuracy can be achieved by using the Swish function. Additionally, they showed that the accuracy of tanh is sufficient good with less computational cost, so Wang et. al's and Roy et. al's choice of using the tanh function for activation is understandable. The higher number of Iterations needed by Liang et. al is later deeper described in 4.2.2.2, the short form is that they have a bigger network.

	Loss function	Optimizer	Activation functions	learning rate	batch size	Iterations	Physical system
Wang et. al	MSE	Adam	tanh + identity	$\begin{array}{l}3 ×10^{-4\end{array}$	2000	x	Ultrasound Wave Propagation
Roy et. al	Residual Loss	Adam	tanh	$\begin{array}{l}10^{-3}\end{array}$	x	20000	Pulse Wave Propagation
Liang et. al	RMSE	Adam	Swish [13]	$\begin{array}{l}10^{-3}\end{array}$	5000/100/100	$\begin{array}{l}10^6/2x10^3/2x10^5\end{array}$	Blood Flow

4.3. Architecture

4.3.1. Partial Derivative Equations

Liang et. al [4] use the Navier-Stoke equation in their PINN implementation. In contrast to this non-linear equation Roy et. al [8] use a linearized 1D partial derivative equation (PDE). Wang et. al [12] used a two-dimensional acoustic wave equation.

These equations differ in a lot of points:

We will later see that the more-complex navier-stoke equation will also need a deeper network to perform properly. To train the PINN these equations are added to the loss function. We will look at how to add these equations to the loss function based on the example of the navier-stoke equation by Liang et. al. First we needed to approximate the functions for the topic in this study, which is blood flow, given by:

$\begin{aligned} f_1 &= u_t + ( \mathbf{U} \cdot \nabla ) u + p_x - \theta \cdot \nabla^2 u \\ f_2 &= v_t + ( \mathbf{U} \cdot \nabla ) v + p_y - \theta \cdot \nabla^2 v \\ f_3 &= u_x + v_y \end{aligned}$

For the loss function we then have the following function:

$\begin{aligned} \text{Loss}_f =\ & \alpha_1 \sum_{k=1}^{N} \left| u^p(x_k, y_k, t_k) - u(x_k, y_k, t_k) \right|^2 \\ &+ \alpha_2 \sum_{k=1}^{N} \left| v^p(x_k, y_k, t_k) - v(x_k, y_k, t_k) \right|^2 \\ &+ \alpha_3 \sum_{i=1}^{3} \sum_{k=1}^{N} \left| f_i(x_{f_k}, y_{f_k}, t_{f_k}) \right|^2 \end{aligned}$

This equation is the sum of squarred errors(SSE) with an additional weight $\begin{array}{l}\alpha_i\end{array}$ .

The loss functions differ for the used wave equation. Wang et. al [12] e.g. used a mean squared error (MSE) as loss function. The type of loss function one should use hardly depends on the problem they try to solve. Roy et. al [8] used a complete different kind of loss term. A so-called residual loss term.

In conclusion we can say to properly train a PINN we add the PDEs to the loss term and combine it with the given data loss. Which PDE to use depends on which problem we want to solve.

4.3.2. Neural Network Architecture

The architecture for physics informed neural networks is almost all of the time based on fully connected feedforward neural network. This is the case because Panghal et al. proved in [6] that a feed forward neural network is sufficient to solve a PDE. All of the approaches we looked at used a simple feed-forward neural network architecture. We will only look at the neural network architecture used in by Roy et. al [8] and Liang et. al [4], to get a feeling of how these PINNs look like. However, also the neural network used by Wang et. al is failry similar.

4.3.2.1. Wang et. al

In the paper by Wang et. al they implemented two PINNs one for an homogeneous model and one for an inhomogeneous model. In their case in/homogeneous means non-/constant against velocity. Their homogeneous model consists of 280 neurons and is deeper than the one presented by Roy et. al [8]. Additionally, their inhomogeneous model - inconstant against velocity - is even deeper with 8 hidden layers and 50 neurons each, consisting of 400 neurons in total.

4.3.2.2. Roy et. al

In Fig. 1 we can see, that the main network consists of three fully connected layers with tanh activation functions. Which is kind of simple in comparison to the state-of-the-art transformer architecture [11] which is used for the latest large language models (LLM). On top of that, using tanh activation functions is also kind of old-fashioned, due to their disadvantages like saturation. In current state-of-the-art models normally a ReLU is used to activate the neurons. Nevertheless, like already stated in [13] it was shown that tanh/Swish are sufficient for training a PINN.

The architecture shown in Fig. 1 is a so called multi-scale fourier network architecture. Which consists of a fourier-feature mapping at the start and the MLP connected right after. This neural network which uses only 192 neurons, is compared to the other build neural networks the smallest one.

4.3.2.3. Liang et. al

The deepest network was used by Liang et. al [4] consisting in total of 15 hidden layers with each 250 neurons. Which makes them by far the biggest network with 3750 neuros. In comparison to the neural network model used by Wang et. al[12] which is smaller by a factor of ~9.4x.

There exists two reasons, why there network was this much bigger than the ones in the other studies. Reason one was: they had an initial input using recorded with a u-UIV. This input was fairly sparse so they used a 5-layer MLP to refine the input. After the input was refinedthey pushed this data into the actual PINN. This can be seen in Fig. 2, where $\begin{array}{l}L_{t}\end{array}$ is the network to refine the input and $\begin{array}{l}L_{C}\end{array}$ is the actual network to predict the output. However, even when ignoring the $\begin{array}{l}L_t\end{array}$ part, the network still has 10-layers with 250 neurons each which is still a 6.25x scale factor in contrast to the biggest network of the other studies. The second reason for the need of a deeper network is that they had the navier-stoke PDE, which is non-linear and way more complex than the one researched in the other studies.

5. Results

In this section we look at the results from the presented studies in section 4.

5.1. Wang et. al

Based on the results of Wang et. al [12], their used PINNs achieved great performance. They implementedtwo models, one homogeneous and one inhomogeneous. The first one achieved a maximum error of less than 3% when predicting the amplitude and position of the wave. For their inhomogenous model the maximum error was a bit higher, around 8%, which is due to the inherent complexity of the inhomogeneous model. One reason for this poorer performance is, that the velocity model has some discontinuity which also leads to discontinuity in the second order gradients of the wave fields, which are more complex to learn.

Additionally, this paper also directly showed that PINNs can achieve a good generalization. Especially, their homogeneous PINN reflected this statement in its results.

In Fig. 3, we can see the results of the homogeneous (a) and inhomogeneous (b) model visualized. The first row is the reference, the second row is the prediction and the last row is the difference. So the darker the blue, the better. It can be seen directly, that the homogeneous model performs better than the inhomogeneous model, but both models lead to a low difference, seen in the last row.

5.2. Roy et. al

Roy et. al [8] also achieved promosing results with a error-rate near zero. They had an in silico study(simulated) and a phantom study (actual physical object), which they compared against the values predicted by the PINN.

For the in-silico study, they achieved a maximum difference of around 0.5-0.6% . The results can be seen in Fig. 4, where a greenish color means the perfect result(bar on the right). We can directly see that the wall velocity predicted by the PINN looks fairly similar to the wall velocity which was simulated. The phantom study, also achieved great performance with a similar maximum error-rate around 0.6%.

5.3. Liang et. al

The results in the paper from Liang et. al [4], also showed that the PINN can improve the performance. In this study they had three types of experiments prepared: simulation, in-vitro (phantom) and in-vivo (rabbit).

First, with the simulated data, they mainly wanted to show that the refined network is worth the shot. We can see the results in Table 2(smaller means better), which directly show that our refined network (HR column) oiutperforms the network which only interpolates the data.

Second, for the in-vitro study, they had to differently approach velocity and pressure. Velocity could be measured directly and they showed that the values predicted by the refined neural network achieved results similar to the theoretical values. The pressure couldn't be measured directly using the phantom so for this case they had to reuse the simulated data, but also here achieved the PINN an error which is below the allowable clinical error range.

Last, the velocity predicted by the PINN compared to the values measured when using an actual body - in this case a rabbit - also improved resolution, while preserving accuracy.

6. Discussion

We saw that physics inspired neural networks are a promising technology in medical imaging. Especially, the fact that they outperformed traditional numerical methods and also deep and complex neural network architectures, makes them interesting. Since a simple MLP architecture is sufficient, this supports the use of PINNs, as it keeps the implementation and design of the model fairly simple.

Additionally, that we can use PDEs to induce prior knowledge into these models and so enable them to perform very well, despite having sparse data is a huge advantage. This advantage is one of the best ones for medical imaging where sparse data is common and data acquisation is non-trivial.

However, we should not ignore the found disadvantages of PINNs and especially, the PINN's used in the presented studies. The studies used relatively simple PDE's, which let them use a straight-forward architecture for the PINN's and kept the training process uncomplicated. Two of the studies (Wang et. al, Roy et. al) even used a non-linear PDE, which is the simplest form of an PDE. Furthermore, PINNs have the same problems as all other neural networks, we can never be sure to find a unique solution and identifying the right hyperparameters isn't as trivial as it may seem.

Additionally, the looked at studies need to be more deepend to give us a broader overview about the limitations of PINNs in medical imaging. For example, the study by Wang et. al[12] needed to retrain the PINN for each velocity. So the studies give as a rough feeling about what is possible, but should be taken with a grain of salt.

7. References

[1] Mohamed Almekkawy, Vesna Zderic, Jie Chen, Michael D. Ellis, Dieter Haemmerich, David R. Holmes, Cristian A. Linte, Dorin Panescu, John Pearce, and Punit Prakash. Therapeutic Systems and Technologies: State-of-the-Art Applications, Opportunities, and Challenges. IEEE Reviews in Biomedical Engineering, 13:325–339, 2020.

[2] Lawrence C. Evans. Partial differential equations. American Mathematical Society, Providence, R.I., 2010.

[3] QiZhi He, David Barajas-Solano, Guzel Tartakovsky, and Alexandre M. Tartakovsky. Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport. Advances in Water Resources, 141:103610, July 2020.

[4] Meiling Liang, Jiacheng Liu, Hao Wang, Hanbing Chu, Mingting Zhu, Liyuan Jiang, Yujin Zong, and Mingxi Wan. High-resolution hemodynamic estimation from ultrafast ultrasound image velocimetry using a physics-informed neural network. Physics in Medicine & Biology, 70(2):025001, January 2025.

[5] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A Deep Learning Library for Solving Differential Equations. SIAM Review, 63(1):208–228, January 2021.

[6] Shagun Panghal and Manoj Kumar. Optimization free neural network approach for solving ordinary and partial differential equations. Engineering with Computers, 37(4):2989–3002, October 2021.

[7] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, February 2019.

[8] Tuhin Roy, Paul Kemper, Nima Mobadersany, and Elisa E. Konofagou. A physics-informed neural network approach for determining spatially varying arterial stiffness using ultrasound imaging: Finite Difference simulation and experimental plaque phantom validation. In 2024 IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Symposium (UFFC-JS), pages 1–4, Taipei, Taiwan, September 2024. IEEE.

[9] Chao Song, Tariq Alkhalifah, and Umair Bin Waheed. Solving the frequency-domain acoustic VTI wave equation using physics-informed neural networks. Geophysical Journal International, 225(2):846–859, February 2021.

[10] B. Vafaeian, M. El-Rich, T. El-Bialy, and S. Adeeb. The finite element method for microscale modeling of ultrasound propagation in cancellous bone. Ultrasonics, 54(6):1663–1676, August 2014.

[11] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, August 2023. arXiv:1706.03762 [cs].

[12] Linfeng Wang, Hao Wang, Lin Liang, Jian Li, Zhoumo Zeng, and Yang Liu. Physics-informed neural networks for transcranial ultrasound wave propagation. Ultrasonics, 132:107026, July 2023.

[13] Duong V. Dung, Nguyen D. Song, Pramudita S. Palar, and Lavi R. Zuhal. On The Choice of Activation Functions in Physics-Informed Neural Network for Solving Incompressible Fluid Flows. In AIAA SCITECH 2023 Forum, National Harbor, MD & Online, January 2023. American Institute of Aeronautics and Astronautics.

[14] Roher, Alex E et al. “Transcranial doppler ultrasound blood flow velocity and pulsatility index as systemic indicators for Alzheimer's disease.” Alzheimer's & dementia : the journal of the Alzheimer's Association vol. 7,4 (2011): 445-55. doi:10.1016/j.jalz.2010.09.002

[15]A. Gaenslen, B. Unmuth, J. Godau, I. Liepelt, A. Di Santo, K.J. Schweitzer, T. Gasser, H.-J. Machulla, M. Reimold, K. Marek, D. Berg, The specificity and sensitivity of transcranial ultrasound in the differential diagnosis of Parkinson’s disease: a prospective blinded study, Lancet Neurol. 7 (5) (2008) 417–424.

[16]A. Halevy, P. Norvig and F. Pereira, "The Unreasonable Effectiveness of Data," in IEEE Intelligent Systems, vol. 24, no. 2, pp. 8-12, March-April 2009, doi: 10.1109/MIS.2009.36.

[17]H. Greenspan, B. van Ginneken and R. M. Summers, "Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique," in IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1153-1159, May 2016, doi: 10.1109/TMI.2016.2553401.

[18]Zienkiewicz, O. C., & Taylor, R. L. (2005). The Finite Element Method: Volume 1: The Basis (6th ed.). Elsevier.

Seitenhierarchie