Introduction

To analyze a dynamic system both numerical simulations and experimental measurements can be performed. However, results of both approaches can deviate significantly. A possible reason for this mismatch might be the uncertainty in the input parameters for the numerical model (e.g. Young modulus, mass density). Therefore, a method to update the model parameters can be used so that the discrepancy between experiments and simulation is decreased. In this project Bayesian updating method is investigated.

Theory

In the following section some theoretical aspects related to Bayesian updating procedure are addressed. 

Dynamics of the System

As described in Modal Analysis, free vibration of a linear elastic system is governed by the following equation: 

\left( - \omega_i^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_i = \mathbf{0},

where \mathbf{M} is the mass matrix, \mathbf{K} is the stiffness matrix and \omega_i and \pmb{\phi}_i are the ith circular eigenfrequency and corresponding eigenvector. The system matrices can for example be obtained from the FEM discretization. 

Now assume an experiment was performed and measured modal data were obtained. The dynamic equation can then be modified as follows:  

(1) \left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} = \pmb{\varepsilon}_i,

where \omega_{m,i} and \pmb{\phi}_{m,i} are the ith measured eigenfrequency and ith measured eigenvector and \pmb{\varepsilon}_i is ith error vector. Note that \pmb{\varepsilon}_i is a vector, considering all measured eigenfrequencies and eigenmodes, eq. (1) results a matrix which components are \varepsilon_{ij}. The latter matrix can be used as a measure of the mismatch between the measured data and modal properties computed based on the FEM discretization. In the case when \omega_{m,i} and \pmb{\phi}_{m,i} coincide with \omega_i and \pmb{\phi}_i, which implies that numerical results correspond to the measurements, components \varepsilon_{ij} will be 0. 

Bayesian Updating

The Bayesian updating method is based on the Bayes' theorem. It can be derived by applying the rule for conditional probabilities and is written as follows: 

(2) P(\mathbf{E}|\mathbf{D})=\frac{ P(\mathbf{E}) \cdot P(\mathbf{D}|\mathbf{E})}{P(\mathbf{D})},

where \mathbf{E} is a vector of the parameters to be updated and \mathbf{D} stands for a matrix containing the measured eigenfrequencies and eigenvectors. The term P(\mathbf{E}) is the probability distribution function (PDF) of the parameter vector \mathbf{E}, which is also often called the prior distribution. P(\mathbf{D}|\mathbf{E}) means the probability distribution of the data for a given parameter vector, also denoted as a likelihood. P(\mathbf{E}|\mathbf{D}) is the posterior probability distribution function and expresses the updated probability distribution function of the parameter vector \mathbf{E}, given the measured data. P(\mathbf{D}) is the normalization factor. In the subsequent sections some of the latter terms will be explained more detailed. 

Prior Probability Distribution Function

Prior distribution function represents a prior belief about the probability distribution of the vector of parameters \mathbf{E} in absence of any data. In this project a multivariate normal distribution was considered, the probability density function (PDF) is defined as follows: 

(3) f_{\mathbf{E}}(\mathbf{E})=\frac{1}{ \sqrt{\left|\mathbf{\Sigma} \right| \left( 2 \pi \right)^{k} }}} \mathrm{exp} \left[ {-\frac{1}{2}(\mathbf{E}-\pmb{\mu})^T\mathbf{\Sigma^{-1}(\mathbf{E}-\pmb{\mu})} \right],

where k is the dimension of the vector of parameters, \pmb{\mu} is the mean vector and \mathbf{\Sigma} is the covariance matrix. 

For further investigated examples an assumption of independent components of parameter vector \mathbf{E} was taken into account. In this case, eq. (3) yields

(4) P(\mathbf{E}) = f_{\mathbf{E}}(\mathbf{E}) = \frac{1}{Z_E(\alpha)} \mathrm{exp} \left[ -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \] \right].

In eq. (4) coefficients \alpha_i=1/\mathrm{Var}(E_i)  were used to represent an inverse of the variance of the individual parameters and the normalization factor was abbreviated by Z_E(\alpha). Coefficients \alpha_i are used to follow the notation of [1]. 

A drawback of the multivariate normal distribution is that also negative values of parameters \mathbf{E} are allowed. If the individual components are material properties (e.g. Youngs modulus or mass density), negative parameters would yield nonphysical values. For such cases different prior distribution might be chosen. A convenient option is a lognormal distribution, which marginal PDF is defined as follows [2]: 

(5) f_E(E)= \frac{1}{E \sigma_{lnE}\sqrt{2\pi}}\mathrm{exp} \left[ -\frac{1}{2} \left( \frac{\mathrm{ln}E-\mu_{lnE}}{\sigma_{lnE}} \right) ^2 \right].

Here \mu_{lnE} and \sigma_{lnE} are the parameters of the lognormal distribution. Note that these parameters are mean and standard deviation of the natural logarithm of the variable E. They can be computed from mean \mu_{E} and standard deviation \sigma_{E} of the variable E itself based on the following formulas: 

\mu_{lnE} = \mathrm{ln} \left( \frac{(\mu_E)^2}{\sqrt{(\mu_E)^2+(\sigma_E)^2}} \right) ,\ \sigma_{lnE} = \sqrt{ \mathrm{ln} \left( 1 + \frac{(\sigma_E)^2}{(\mu_E)^2} \right)}.

As the components of the vector \mathbf{E} are assumed to be independent, the joint PDF is computed as a product of the marginal PDFs

(6) f_{\pmb{E}}(\pmb{E}) = \prod_{i=1}^k f_{E_i}(E_i).

Likelihood Distribution Function

The likelihood term P(\mathbf{D}|\mathbf{E}) describes how well does a given parameter vector represent the measured data. In the scope of this project the likelihood function is defined with the use of the error matrix \varepsilon_{ij} which was discussed in section Dynamics of the System. It reads:

(7) P(\mathbf{D}|\mathbf{E}) = \frac{1}{Z_D(\beta)} \mathrm{exp} \left[ - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} \varepsilon_{ij}^2 \] \right] = \frac{1}{Z_D(\beta)} \mathrm{exp} \left[ - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \] \right] ,

where \beta is a coefficient expressing the contribution of the measured modal data and Z_D(\beta) is the normalization factor which can be calculated as follows:

Z_D(\beta) = \int \mathrm{exp} \left[ - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \] \right] \mathrm{d} \mathbf{D}.

Note that in eq. (7) the components of the system matrices \mathbf{K} and \mathbf{M} depend on the parameter vector \mathbf{E}, therefore the likelihood is a function of the parameter vector. 

If the system matrices correspond to the measured modal properties, the components of the error matrix \varepsilon_{ij} will be zero and for such parameter vector the likelihood term will be maximal. 


Normalization Factor

Normalization term ensures that posterior function is a probability distribution and fulfills the normalization rule: 

P(\mathbf{D}) = \int { P(\mathbf{E}) \cdot P(\mathbf{D}|\mathbf{E}) }\mathrm{d} \mathbf{E} = \int { \frac{1}{Z_E(\alpha)} \mathrm{exp} \left[ { -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \]} \right] \cdot \frac{1}{Z_D(\beta)} \mathrm{exp} \left[{ - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]}} \right] \mathrm{d} \mathbf{E} = \\ = \frac{1}{Z_E(\alpha) \cdot Z_D(\beta)} \int { \mathrm{exp} \left[{ -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \] - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]} }\right]\mathrm{d} \mathbf{E}

Posterior Probability Distribution Function 

By substitution of the individual terms into the eq. (2) the posterior distribution function is obtained

(8) P(\mathbf{E}|\mathbf{D})=\frac{ P(\mathbf{E}) \cdot P(\mathbf{D}|\mathbf{E})}{P(\mathbf{D})} = \dfrac{ \frac{1}{Z_E(\alpha)} \mathrm{exp} \left[ { -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \]} \right] \cdot \frac{1}{Z_D(\beta)} \mathrm{exp} \left[ { - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]} \right] } { \int \frac{1}{Z_E(\alpha) \cdot Z_D(\beta)} { \mathrm{exp} \left[ { -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \] - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]}} \right] \mathrm{d} \mathbf{E}} = \\ = \dfrac{ \mathrm{exp} \left[ { -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \] - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]} \right] }{\int { \mathrm{exp} \left[ { -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i (E_i - \mu_i)^2 \] - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} {\left[\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} \right]_j}^2 \]} } \right] \mathrm{d} \mathbf{E}}.

If the number of parameters to be updated is large, the normalization integral in the latter equation might be difficult to compute. Therefore, numerical sampling method from the posterior distribution needs to be used instead. 

Markov Chain Monte Carlo (MCMC)

The application of the previous Bayesian approach can be useful to update a model to predict a set of measured properties \bf{y} from a system, e.g. modal properties, from a set of updated parameters \bf{E}_i. A Monte Carlo technique generates as many samples of these vectors as needed, making it possible to get distributions of the predicted properties from a system. This is especially useful for estimations of the precision of a model with respect to a set of measurements or the evaluation of the error of that model. The objective is to get a set of updated parameters that will predict correctly the measured properties. The probability of \bf{y} given \bf{D} can be written as

(9) P(\mathbf{y}|\mathbf{D})=\int P(\mathbf{y}|\mathbf{E})P(\mathbf{E}|\mathbf{D})d\mathbf{E}}

Again, this equation is difficult to compute for a large number of parameters, as it depends on Equation (8), so a Markov Chain Monte Carlo (MCMC) approach can be used to approximate it. This iterative method is especially useful for the estimation of probability distributions, as it allows to obtain a distribution of the proposed parameters and its predictions.

Markov chain approach

MCMC is an iterative method based on the consecutive sampling from an initial proposed distribution from the parameters. The new sampled models are then accepted or rejected based on some specific criteria. This succession of models (or states) composes the so-called random walk of the method. If the chain is well constructed, the random walk will converge around an equilibrium distribution. Then the state of the chain after reaching the equilibrium is taken as the distribution of the model.

Illustration of Markov Chain Monte Carlo from [4]

In our case, the MCMC algorithm would proceed as follow:

  1. A measured set of modal properties (eigenfrequencies and eigenmode shapes) is obtained. This can be from any source: experimental measurements, simulations, etc. 
  2. We set up a theoretical distribution of the \bf{E} parameters to draw samples from. 
  3. A first state is calculated from a sample drawn from \bf{E}.
  4. A criterion for the acceptance of this set of parameters is applied. This criterion must ensure the convergence of the MCMC. In our case, we will use the Metropolis-Hastings algorithm.
  5. If the sample is accepted, the distribution is updated with it.
  6. Steps 3 to 5 are repeated for a set number of iterations or until some convergence criteria is met.

When the algorithm is finished, we can take the random walk as the distribution of the set of parameter. This distributions can be constructed based on those of the model parameters \bf{E}_i or based on the modal properties \bf{y}_i. Statistical analysis of these distributions can be later performed. The Equation (9) then approximates the vector of modal properties

(10) \mathbf{\tilde{y}}\approx \dfrac{1}{L}\sum_{i=1}^{R+L-1}G(\mathbf{E}_i)

where G(\bf{E}_i) is the outcome or the model, L is the number of retained states and R the discarded initial states ("burn-out" steps, see section Considerations on the convergence of the algorithm). Therefore, the mean of the distribution of the sampled modal properties is taken as an approximation of the real ones. If the method is constructed correctly this approximation will converge to the measured ones. However, the initial parameters  \mathbf{E}_i would not necessarily converge due to the contribution of the prior probability.

Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm, or simply Metropolis algorithm, is one of the most used for generating a convergent Markov chain. It is based on the calculation of the probabilities expressed in the previous section. The algorithm is as follows:

(11) If\text{ }P_{new}(\mathbf{E}|\mathbf{D})>P_{old}(\mathbf{E}|\mathbf{D}),\text{ } accept \text{ }state \text{ }\mathbf{E}_{new}\\ else,\text{ } accept \text{ }\mathbf{E}_{new}\text{ }with\text{ }probability\text{ }\dfrac{P_{new}(\mathbf{E}|\mathbf{D})}{P_{old}(\mathbf{E}|\mathbf{D})}

The use of the Metropolis algorithm ensures the convergence of the chain, although the speed of convergence is not known beforehand. However, on the contrary to some rejection sampling methods, the Metropolis algorithm doesn't suffer from the "curse of dimensionality". This means, that the rejection probability does not increase exponentially with the number of parameters, making it suitable for stochastic processes where a big number of parameters are involved. Some factors for the speed of convergence of the chain have to still be considered. 

In some cases, the probabilities result in very small values that can be problematic to deal with floating points in a computer. For those situations, given that the probabilities for the chosen approach are usually modelled as exponential functions, the Metropolis algorithm can adopt a logarithmic form:

(12) If\text{ } ln(P_{new}(\mathbf{E}|\mathbf{D}))<ln(P_{old}(\mathbf{E}|\mathbf{D})),\text{ } accept \text{ }state \text{ }\mathbf{E}_{new}\\ else,\text{ } accept \text{ }\mathbf{E}_{new}\text{ }with\text{ }probability\text{ }ln(P_{new}(\mathbf{E}|\mathbf{D}))-ln({P_{old}(\mathbf{E}|\mathbf{D})})

In this formulation, only the exponents of the probabilities have to be computed, being more suitable for high precision and small value computations. Also, sums are much less expensive than divisions from a computational point of view, which is relevant for iterative processes like MCMC.

Considerations on the convergence of the chain

First, we can observe that the first samples of the chain are quite unstable and usually diverge greatly from the objective equilibrium. If these samples are taken into account, the resultant distribution may have some outliers derived from those steps, which deteriorates artificially the quality of the distribution and delays reaching the equilibrium. In many cases, a first arbitrary number of accepted samples is disregarded, the so-called "burn-out" period. The final distribution is only taken into account after some stabilization of the chain is observed. This is especially relevant when the new model takes into account the standard deviation of the samples for doing the updating. In the case of this project, the standard deviation of the sampled model's distributions is kept constant, and the only update happens in the mean value of the distribution from where the parameters are sampled. Therefore, the benefits of having a burn-out period are negligible, so it will not be taken into account.

Another big factor in the convergence of the algorithm is the rejection rate and its relationship with the variance \sigma^2 of the distribution of the parameters. The rejection rate is the number of rejected samples with respect to the total number of proposed ones. According to Robert, Gelman and Wilk  [5], the optimal rejection rate for parameters modelled as independent random normal distributions is 50%, being around 24% for co-joint normal distributions. A variance too big will result in a very high rejection rate, slowing the convergence due to the variability. If the variance is too low, the chain will stabilize very soon on a solution that is not the equilibrium one, so the convergence of the MCMC will be slowed. In case that the variances are prescribed by a real model, the orders of magnitude of the  \alpha and the \beta have to be carefully chosen, not only to reflect the influence of their respective components but also for accelerating the convergence.

Pseudocode

For a better understanding of the procedure, a workflow of the code is described in this section. 

  1. Define the mean vector and covariance matrix of the prior distribution of the vector of parameters \mathbf{E}
  2. Obtain measured eigenfrequencies \omega_{m,i} and eigenvectors of the system \pmb{\phi}_{m,i}
  3. Draw a sample \mathbf{E}_p from the prior distribution 
  4. \mathbf{for} \ i = 1: \mathrm{N_{samples}}
    1. \mathbf{if} \ i > 1
      1. Draw a sample from a proposal distribution \mathbf{E}_p \propto N(\mathbf{E}_a, \mathbf{I})
    2. Compute system matrices \mathbf{K} and \mathbf{M} based on the sample \mathbf{E}_p
    3. Evaluate the error matrix components \varepsilon_{ij} from eq. (1)
    4. Compute the prior P(\mathbf{E}_p) and the likelihood P(\mathbf{D}|\mathbf{E}_p) for the current sample \mathbf{E}_p based on eq. (4) and eq. (7)
    5. Evaluate the posterior (neglecting the normalization term) as: P_{new}(\mathbf{E}_p|\mathbf{D}) = P(\mathbf{E}_p) \cdot P(\mathbf{D}|\mathbf{E}_p)
    6. \mathbf{if} \ i = 1
      1. Rename the sample: \mathbf{E}_a=\mathbf{E}_p
      2. Save the posterior value as: P_{old}(\mathbf{E}_p|\mathbf{D}) = P_{new}(\mathbf{E}_p|\mathbf{D})
    7. \mathbf{else} \ \rightarrow Apply Metropolis-Hastings algorithm
      1. \mathbf{if} \ P_{new}(\mathbf{E}_p|\mathbf{D}) > P_{old}(\mathbf{E}_p|\mathbf{D}) \ \rightarrow  sample accepted
        1. Rename the sample: \mathbf{E}_a=\mathbf{E}_p
        2. Save the posterior value as: P_{old}(\mathbf{E}_p|\mathbf{D}) = P_{new}(\mathbf{E}_p|\mathbf{D})
      2. \mathbf{elseif} \ P_{new}(\mathbf{E}_p|\mathbf{D}) / P_{old}(\mathbf{E}_p|\mathbf{D}) > \mathrm{rand}(0,1) \ \rightarrow  sample accepted
        1. Rename the sample: \mathbf{E}_a=\mathbf{E}_p
        2. Save the posterior value as: P_{old}(\mathbf{E}_p|\mathbf{D}) = P_{new}(\mathbf{E}_p|\mathbf{D})
      3. \mathbf{else} \ \rightarrow sample not accepted
    8. Save the last accepted sample: \mathbf{E}_{all}(i,:)=\mathbf{E}_a
  5. Analyze the accepted samples

Examples and Results

To test the Bayesian updating method two examples were considered. The first example is a simple 2DOF system where the basic ideas of the Bayesian updating method could be investigated and well understood. As a second example more complex system was analyzed, a beam discretized using the finite element method. 

It was not possible to obtain real measured eigenfrequencies and mode shapes so artificially generated modal data were used instead. 

2DOF System

The system is depicted in figure below [3].

The system matrices are given as follows: 

\mathbf{M}=\begin{bmatrix} k_1+k_2 & -k_2 \\ -k_2 & k_2 \end{bmatrix} , \ \mathbf{M}=\begin{bmatrix} m_1 & 0 \\ 0 & m_2 \end{bmatrix} .

For simplicity, only the spring stiffness k_1 and k_2 were chosen as random variables and the masses were considered to be deterministic. The prior distribution is considered as in eq. (4) with mean vector set to \pmb{\mu}=[\mu_{k_1} \ \mu_{k_2}]^T and coefficients \pmb{\alpha} = [\alpha_1 \ \alpha_2]^T . The likelihood function was taken as in eq. (7) with coefficient \beta as a parameter.

The artificial measured data were generated using modified inputs \bar{k}_1 and \bar{k}_2 for the system matrices, from which the eigenvalues and eigenvectors were computed. The MCMC sampling was performed with number of samples N_s. The specific values for the inputs are summarized in the following table and the prior distribution is shown.

\mu_{k_1}

100~ [\mathrm{N/m}]

\mu_{k_2}

100~ [\mathrm{N/m}]

\alpha_1

0.01~ [\mathrm{m}^2/\mathrm{N}^2]

\alpha_2

0.01~ [\mathrm{m}^2/\mathrm{N}^2]

m_1

100 ~ [\mathrm{kg}]

m_2

100 ~ [\mathrm{kg}]

\beta

1

\bar{k}_1

1.3 \cdot \mu_{k_1} = 130~ [\mathrm{N/m}]

\bar{k}_2

1.3 \cdot \mu_{k_2} = 130~ [\mathrm{N/m}]

N_s

10^5

In the following figure the histograms of saved parameters k_1 and k_2 are plotted. 

In the table bellow the posterior mean vector estimate \pmb{\mu}_p=[\mu_{p,k_1} \ \mu_{p,k_2}]^T and standard deviation estimates \sigma_{p,k_1} and \sigma_{p,k_2} are compared with the mean and standard deviation of the prior distribution. Furthermore, the values of the modified input stiffnesses used for the generation of the artificial data are shown. 

Prior

Data

Posterior

\mu_{k_1}

100~ [\mathrm{N/m}]

\bar{k}_1

130~ [\mathrm{N/m}]

\mu_{p,k_1}

121.0035~ [\mathrm{N/m}]

\mu_{k_2}

100~ [\mathrm{N/m}]

\bar{k}_2

130~ [\mathrm{N/m}]

\mu_{p,k_2}

128.6601~ [\mathrm{N/m}]

\sigma_{k_1}

10~ [\mathrm{N/m}]



\sigma_{p,k_1}

6.3482~ [\mathrm{N/m}]

\sigma_{k_2}

10~ [\mathrm{N/m}]



\sigma_{p,k_2}

3.6250~ [\mathrm{N/m}]

One can notice how the distribution of the parameters was updated based on the measured data. How much does the posterior distribution differ from the prior one strongly depends on the coefficients \alpha_i and \beta.

Parameter Study

In the following section an influence of both parameters on the posterior mean vector and standard deviation estimates is studied. For convenience, only one parameter \alpha was considered for both k_1 and k_2, therefore it holds \alpha_i=\alpha.

In figure below, coefficient \alpha varies in range (0.001,5)~ [\mathrm{m}^2/\mathrm{N}^2] while \beta=1 is kept constant.


As described before, \alpha represents the inverse of the variance of the prior distribution, in other words, it indicates how much trust is put into the prior belief about the individual parameters. Small values of \alpha mean large variance of the prior distribution and therefore the parameters are updated more significantly. This is also clear from the figure above, for \alpha=0.001~ [\mathrm{m}^2/\mathrm{N}^2] the mean estimates of both stiffnesses are very close to modified inputs \bar{k}_1 and \bar{k}_2 based on which the artificial measure modal data were computed. On the other hand, for large values of \alpha the prior variance is small therefore only small update is allowed. A similar trend can be observed for the standard deviations.

Now an impact of \beta will be examined. It expresses the influence of the likelihood function on the posterior distribution. In the next figure, coefficient \beta changes its value in interval (0.01,10) and \alpha=0.1~ [\mathrm{m}^2/\mathrm{N}^2]

As can be seen, for increasing \beta the posterior mean estimates are approaching the stiffness values corresponding to the measured data. This can also be explained based on the eq. (7), where \beta multiplies term -\[\sum_{i=1}^{k} \sum_{j=1}^{k} \varepsilon_{ij}^2 \]. This way \beta can be seen as a penalization factor. However, for large values of \beta numerical instability can occur because the likelihood function values will decrease to zero very rapidly and the computer precision is limited. 

Lognormal Prior

As discussed in section Prior Probability Distribution Function, a lognormal prior distribution might be more suitable for description of the parameters k_1 and k_2, since stiffness can only take positive values. In the following, previously investigated system was computed with lognormal prior taken from eq. (5) and eq. (6). Specific values of the distribution parameters are stated in the following table and the related prior distribution is plotted and compared to the normal prior.



\mu_{k_1}

100~ [\mathrm{N/m}]

\mu_{lnk_1}

4.6002~ [\mathrm{N/m}]

\mu_{k_2}

100~ [\mathrm{N/m}]

\mu_{lnk_2}

4.6002~ [\mathrm{N/m}]

\sigma_{k_1}

10~ [\mathrm{N/m}]

\sigma_{lnk_1}

0.0998~ [\mathrm{N/m}]

\sigma_{k_1}

10~ [\mathrm{N/m}]

\sigma_{lnk_1}

0.0998~ [\mathrm{N/m}]

In figure below histograms of the sampled parameters k_1 and k_2 are plotted. 

A comparison of the results with different prior distributions is provided in the following table. The deviation of the results is insignificant since the difference between the prior distributions for the specific parameters is minor.

Posterior distribution parameters


Gaussian prior

Lognormal prior

\mu_{p,k_1}

121.0035~ [\mathrm{N/m}]

121.9600~ [\mathrm{N/m}]

\mu_{p,k_2}

128.6601~ [\mathrm{N/m}]

129.4454~ [\mathrm{N/m}]

\sigma_{p,k_1}

6.3482~ [\mathrm{N/m}]

7.2798~ [\mathrm{N/m}]

\sigma_{p,k_2}

3.6250~ [\mathrm{N/m}]

3.9279~ [\mathrm{N/m}]

Beam example

The Bayesian update using MCMC can also be applied to more complex examples, like a beam with defects.  The analyzed case is a beam of length L fixed in both ends with a mass density \rho, Young's modulus E_Y and a rectangular cross-sectional area A_{acc}. However, due to a manufacturing defect or holes drill in it, the cross-sectional area A_{nacc} of a part of the beam is not accurately known. This case can be solved with a Finite Element (FE) model with a discretization like the one in the image.

In this case, the beam has been discretized by n=11 elements, of which the white ones have the same area A_{acc} and the red ones A_{nacc}. The modal properties of interest are the first 4 eigenfrequencies. Without model updating, the "real" modes cannot be accurately known beforehand, so this technique allows a better approach to the problem.  All the variables will be modelled with normal distributions of a set mean \mu and a standard deviation \sigma. The theoretical sides of the cross section are, based in the values used by Marwala in [1],  a=0.0292 m and b=0.0096 m, and the length of the beam L=1.1 m. The set of updating parameters is the following:

EVariableMeanStandard deviationUnits

E_1

A_{acc}

2.8032\times10^{-4}

A_{acc}/20=4.4016\times10^{-5}

m^2

E_2

E_Y

700\times10^8

20\times10^8

N/m^3

E_3

\rho

2700

25

kg/m^3

E_4

A_{nacc}

2.8032\times10^{-4}

A_{nacc}/20=4.4016\times10^{-5}

m^2

Set-up of the problem

A random sample from this model is taken as the "true" set of parameters. This sample could also have been chosen arbitrarily, as in the previous example. One possible set of variables drawn is, for example:

EVariableValueUnits

E_1

A_{acc}

2.8770\times10^{-4}

m^2

E_2

E_Y

719.15\times10^8

N/m^3

E_3

\rho

2687.7

kg/m^3

E_4

A_{nacc}

2.8032\times10^{-4}

m^2

For this set of parameters, a FEM model of the beam (modelled in Matlab® with the package provided for the course of Modelling and Simulation in Structural Mechanics)  gives the following natural frequencies \omega and its associated eigenmodes with the plotted shapes:

Eigenfrequencies

[rad/s]

\omega_1

795

\omega_2

2191

\omega_3

4289

\omega_4

7085

This will be the "truth" against which the full MCMC procedure will be measured. For setting the MCMC, first the \alpha and \beta parameters have to be chosen. In this case, the chosen parameters are:

ParameterVariableValue

\alpha_1

A_{acc}

100

\alpha_2

\rho

100

\alpha_3

E_Y

100

\alpha_4

A_{nacc}

30

\beta

\varepsilon_{ij}

15

These parameters were chosen in such a way that the probabilities P(\mathbf{E}) and P(\mathbf{D}|\mathbf{E}) have the same order of magnitude and the rejection rate is reasonable. Also, \alpha_4 is chosen significantly smaller than the other \alpha to update the defective areas more often. However, this is not sufficient for the convergence of the MCMC to a reasonable value. It was observed, that the MCMC would have a tendency to update the Young's modulus more often than the other variables and to set the cross-sectional areas to 0. This was thought to be related to the big difference between the orders of magnitudes of the parameters. To reduce this effect, a normalization of the error was introduced. The prior probability function is modified to

(13) P(\mathbf{E}) = \mathrm{exp} \left[ -\frac{1}{2} \[\sum_{i=1}^{k} \alpha_i \left(\dfrac{E_i - \mu_i}{\mu_i}\right)^2 \] \right].

This is equivalent to dividing every \alpha_i by the respective \mu_i of the theoretical distribution of the parameters. In case of applying the standard equation, this normalization could have been included in the \alpha parameters. For the likelihood probability distribution, we operate analogously:

(14) P(\mathbf{D}|\mathbf{E}) = \mathrm{exp} \left[ - \beta \[\sum_{i=1}^{k} \sum_{j=1}^{k} \left({\dfrac{\left( - \omega_{m,i}^2 \mathbf{M} + \mathbf{K} \right) \pmb{\phi}_{m,i} }{\omega_{m,i}^2}}\right)_j^2 \] \right] ,

In this case, the influence of every mode is normalized with respect to its eigenfrequency. Once more, it is equivalent to apply different \beta_i to each of the modes, and varying this factor in each iteration to adjust to the new modes. This ensures that the criteria for updating is uniform for every mode, although it could be modified in case any of the modes would be of special interest. This normalization factor varies with each iteration, but the variance in the eigenfrequencies between two steps is sufficiently enough that the convergence of the MCMC is not compromised. Also, it does not generate a bias in the solution, as the ideal 0 remains in the case when the eigenvalue problem and the parameters are 0, i.e. the modes are the same as the measured ones.

Analysis of results

The MCMC is run for 50000 iterations with a rejection rate of 19%. The results of this simulation are represented in the following plots. In all the line plots, the truth is represented in red, the mean with an interval of confidence of the 95% is represented in blue (continuous for the mean and dashed for the confidence interval) and the random walk in green. First, we analyze the convergence of the parameters vector \mathbf{E}:

From the random walk, we can see that the lack of burn out period stops having a significant influence after the first 10000 samples. Also, the areas are modified more often than the other parameters. This is probably due to the greater standard deviation with respect to the mean of their respective distribution. This is in line with a possible real model, as the density and the Young's modulus of the material are usually more accurate than the areas of the beam. If we focus only on the last steps, we get the following plots:

Here, the effect of the parameter \alpha is observable. The variation of the least accurate cross-sectional E_4 is greater than that of E_1 due to having a smaller \alpha. Also, in these plots the convergence of the parameters can be appreciated. If we plot the relative error against our ground truth, we obtain the following plots:

In these plots we see the general convergence of all the parameters. This is especially true for the Young's modulus and the density of the beam, converging against the true measured value. The areas still have room to improve the accuracy, however they are under the 5%, which is acceptable. Taken into account that they don't necessarily need to converge to the original ones, so the model can show enough of a convergence.

If we plot the histograms of the parameters, we can see that the distributions have the expected normal shapes. The measured value is indicated in red once more. Summing up, the results of the sampled set can be gathered in the following table:

ETruthStandard deviation (theoretical)Mean (sampled)Standard deviation (sampled)Relative error

E_1

2.8770\times10^{-4}

4.4016\times10^{-5}

2.8078\times10^{-4}

3.1318\times10^{-5}

-2.405\%

E_2

719.15\times10^8

20\times10^8

717.72\times10^8

189.63\times10^8

-0.199\%

E_3

2687.7

25

2685.6

317.5

-0.078\%

E_4

2.8032\times10^{-4}

4.4016\times10^{-5}

2.7823\times10^{-4}

5.5\times10^{-5}

-3.679\%

Surprisingly, the parameters with the biggest precision with respect to the mean value of the distribution have standard deviations greater than the theoretical. This can be directly related, as it can be an indicator that the random walk is trying more extreme values that, together with the smaller variation on of the path, help the convergence to the mean value. With more samples, the standard deviation of the variables will approximate to the theoretical one. An analysis of the correlation of the parameters can be performed.

The scatter plot just shows in this case that the variables seem to be independent, with its dispersion related to the standard deviation of the variables itself.

However, it makes no sense to know the true parameters of the model beforehand and still needing a model updating. Usually, only the modal properties of the system are known, so it is relevant to repeat the same analysis with the first four eigenfrequencies.

In the case of the eigenfrequencies, the convergence seems to be faster, and the random walk appears to be truly random. Also, all the eigenfrequencies are modified roughly in the same way, as expected. The relative error plots show the same trend:

As it can be observed, the relative error is oscillating around the 0%, showing that the updating of the model regarding the eigenfrequency is already converged. This is confirmed when analyzing the contributions of the prior and the likelihood probabilities to the total error. in the samples were the model is already almost converged, the likelihood component of the error vanishes.

The histograms show the same normal distributions. They are situated around the truth one, indicated with the red line. This is confirmed with the comparison with the ground truth:

EigenfrequencyTruthSampledStandard deviationRelative error

\omega_1 [rad/s] 

7958041171.212%

\omega_2[rad/s]

219122133221.019%

\omega_3[rad/s]

428943426261.232%

\omega_4[rad/s]

7085716310341.103%

All the values for the eigenfrequencies are within reason. The error is only slightly over 1%, which is a good approximation for the eigenfrequency. The standard deviation is also reasonable, being around a 15% of the mean. Therefore, the updated model can predict accurately the measure eigenfrequencies. However, we must take into account that a finite set of modes does not answer to a unique set of parameters. Different sets can generate the same solution for the eigenvalue problem in a number of modes smaller than the number of degrees of freedom. That explains the variability observed in the posterior distribution of the parameters while the posterior distribution of the eigenfrequencies remains accurate.

The scatter plots reflect the linear dependency of the eigenvalues. However, it is remarkable that the first eigenfrequencies are more independent of each other. Updatings in the parameters seem to affect more the first two eigenfrequencies.

In any case, this example shows the applicability of the Bayesian updating using the Monte Carlo Markov Chain method to real models like a FEM beam. The results can be improved with further tuning of the parameters and a larger number of samples. The results obtained are satisfactory regardless.

References

[1] Marwala, Tshilidzi & Sibisi, Sibusiso. (2005). Finite Element Model Updating Using Bayesian Framework and Modal Properties. Journal of Aircraft. 42. 10.2514/1.11841.

[2] Papaioannou I., Stochastic Finite Element Methods, Lecture notes, Engineering Risk Analysis Group, Technical University of Munich, 2020

[3] Dumond, Patrick & Baddour, Natalie. (2014). A structured approach to design-for-frequency problems using the Cayley-Hamilton theorem. SpringerPlus. 3. 10.1186/2193-1801-3-272.

[4] Jin, Seung-Seop & Ju, Heekun & Jung, Hyung-Jo. (2019). Adaptive Markov chain Monte Carlo algorithms for Bayesian inference: recent advances and comparative study. Structure and Infrastructure Engineering. 10.1080/15732479.2019.1628077.

[5] Roberts, G. O.; Gelman, A.; Gilks, W. R. (1997) Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 , no. 1, 110-120. doi:10.1214/aoap/1034625254.  https://projecteuclid.org/euclid.aoap/1034625254





















  • Keine Stichwörter