Parametric Moder Order Reduction with Machine Learning

Term: Summer Semester 2022- Winter Semester 2022-2023

Authors: Unbekannter Benutzer (ge32qad) Berkay Oralalp Tanima Ahmed Hridi

Supervisors: Unbekannter Benutzer (ga97pel), Sebastian Schopper

Introduction

The study of a variety of complicated physical systems, including large-scale dynamical systems, is fundamentally aided by numerical simulation. The growing need for accuracy has led to this larger-scale, more complex model of dynamical systems. However, the large scales properties of these models often lead to incredible demands on computational resources. For example, repeated finite element simulations are needed for many-query analyses like uncertainty quantification or optimization. These are usually computationally very expensive and complex. Performing model order reduction can reduce the complexity of the finite element models. Using reduced models is an efficient and fast way to get good approximations. In design, control, optimization, and uncertainty quantification, the parametric model order reduction can be used because the models require repeated model evaluations over the parameter space. [1]

Machine learning algorithm relies on a large amount of data that can be used to make predictions on previously unseen datasets. This report will present Parametric model order reduction (pMOR) can be used to obtain good, reduced models for different realizations of the model parameters. It has also implemented pMOR methods with Machine Learning algorithms (Neural Networks) and investigated the efficiency of the methods in the context of uncertainty quantification and optimization.

Model Order Reduction

Model order reduction has been extensively used to reduce the order of higher-order dynamical systems and make them more computationally inexpensive. ROMs can model flow fields using significantly lesser parameters. When a mathematical model is being derived, it is often a very detailed form of higher-order differential equations. For analysis and design purposes, it is sometimes essential to determine the possibility of finding lower-order equations of the same type that may be considered to adequately reflect the original system's main characteristics. The objective of simplification is to obtain a low-order model of the existing high-order model such that both are equivalent in terms of system response and being close to each other in some physical representation means. [2]

To obtain a model of a lower order, a significant number of methods have been proposed. The method that can maintain the parametric dependency of the system in the reduced matrices, is called Parametric Model Order Reduction (pMOR) method. Parametric model order reduction aims at the broader class of problems for which the governing equations depend on a set of parameters. For example: parameterized partial differential equations and large-scale systems of parameterized ordinary differential equations. Generating low-cost but accurate models that describe the system response for different values of the parameters is the main objective of pMOR.

This method can be divided into two: intrusive and non-intrusive.

Projection-based methods are the intrusive ones, meaning the full system is projected into a lower-dimensional subspace. The accuracy of the results depends on the choice of the reduced basis. There are different methods that are being used: Proper orthogonal decomposition (POD), in which the reduced basis is built from snapshots of the system's response at various frequencies.

There are different approaches available to achieve Parametric MOR. In the global approach, the When reduced basis computed for different parameters is concatenated to one reduced basis, which is called the global approach. On the other hand, local approaches instead interpolate either between the reduced bases or the reduced matrices for different sampled parameter values to obtain the reduced basis/matrices at an unqueried point. [3]

Figure 1: Projection procedure

One of the problems with intrusive methods is that one must know the full-order system matrices to be able to perform the projection. If the information is not available, non-intrusive methods can be used.

Machine learning is one of the non-intrusive methods where it tries to learn the relation between the input and output parameters. Neural Network are one of the most popular methods these days where their application is more general.

Figure 2: A general Neural Network Architecture

Problem Definition

Figure 3: Geometry and setup of CLT-element, all measures in m [4]

We investigate a cross-laminated timber plate with dimensions $\begin{array}{l}l \times b \times t = 2.5m \times 0.8m \times 0.081m\end{array}$ ,consisting of three layers of crosswise glued timber. The plate is excited through an impedance hammer. It is modeled as a homogeneous, orthotropic plate under the assumptions of Reissner-Mindlin theory and the material parameters are randomly sampled.

The governing equations for the system are:

(1)

$\begin{align} M\underline{\ddot{u}}+C\underline{\dot{u}}+K\underline{u}&=&\underline{f}\\ V^{H}MV\underline{\ddot{u}}+V^{H}CV\underline{\dot{u}}+V^{H}KV\underline{u}&=&V^{H}\underline{f} \\ M_{r}\underline{\ddot{u}_{r}}+C_{r}\underline{\dot{u}_{r}}+K_{r}\underline{u_{r}}&=&\underline{{f}_{r}} \end{align}$

Here matrices $\begin{array}{l}K, C\end{array}$ and $\begin{array}{l}M\end{array}$ denote stiffness, damping and mass matrix with parametric uncertainty. Here we use a hysteretic damping model, where $\begin{array}{l}u\end{array}$ is the unknown, uncertain, and time-dependent displacement vector, and $\begin{array}{l}f\end{array}$ is the deterministic time-dependent force vector acting on the system. [4] The objective is to find “appropriate” reduced matrices $\begin{array}{l}V^{H}\end{array}$ and $\begin{array}{l}V\end{array}$ , and solve the Single Degree Equation more rapidly.

Data Workflow

The data workflow for the project is as follows:

Figure 4: Data Workflow

Methodology

The Venn diagram in Figure 5 shows all the methods that have been investigated to reach the objective of this report. For local Model Order Reduction, Second Order Arnoldi Algorithm (SOAR) has been implemented to figure out Krylov Subspace. For the method reduction by global basis, the objective was to build libraries of snapshots for different frequencies to generate a global basis. Then the input from this step has been used to interpolate the coefficients of the global reduced basis. Matrix interpolation is done to find the common basis for the significant modes. Finally, Cholesky Decomposition has been performed to assure the positive definiteness of the reduced matrices.

Figure 5: Venn Diagram of methods

Sampling Strategy and Procedure

Independent Material Parameters

As mentioned before, the material model of the system corresponds to a homogenized Cross Laminated Timber (CLT) plate, wherein the treatment of the material model represents an orthotropic one. From a mathematical perspective, for a dynamic analysis of the system, 11 parameters are required. However, according to [2], [3] after performing methods to quantify to estimate the distributions of the parameters given measurements of impedance tests, six parameters, namely the elastic moduli in $\begin{array}{l}x\end{array}$ and $\begin{array}{l}y\end{array}$ directions ( $\begin{array}{l}E_{x}\end{array}$ , $\begin{array}{l}E_{y}\end{array}$ ), the shear moduli ( $\begin{array}{l}G_{xy}\end{array}$ , $\begin{array}{l}G_{zx}\end{array}$ , $\begin{array}{l}G_{yz}\end{array}$ ) and the hysteretic damping coefficient ( $\begin{array}{l}\eta\end{array}$ ) have the greater influence on the final response of the system after an impact excitation. For the studied case, these parameters have an associated probability distribution which is shown in Table 1. The parameters are the same used in [4].

Table 1. Descriptors of distribution of relevant material parameters of the modeled CLT plate.

Variable	Distribution	Mean	Coefficient of Variation
$\begin{array}{l}E_{x}\end{array}$	Lognormal	$\begin{array}{l}1.061 \times 10^{10}\end{array}$	$\begin{array}{l}0.1\end{array}$
$\begin{array}{l}E_{y}\end{array}$	Lognormal	$\begin{array}{l}0.85 \cdot 7.605 \times 10^8\end{array}$	$\begin{array}{l}0.1\end{array}$
$\begin{array}{l}G_{xy}\end{array}$	Lognormal	$\begin{array}{l}0.7 \cdot 6.9 \times 10^8\end{array}$	$\begin{array}{l}0.1\end{array}$
$\begin{array}{l}G_{zx}\end{array}$	Lognormal	$\begin{array}{l}1.725 \times 10^ 8\end{array}$	$\begin{array}{l}0.1\end{array}$
$\begin{array}{l}G_{yz}\end{array}$	Lognormal	$\begin{array}{l}9.857 \times 10^7\end{array}$	$\begin{array}{l}0.1\end{array}$
$\begin{array}{l}\eta\end{array}$	Lognormal	$\begin{array}{l}2 \times 10^{-2}\end{array}$	$\begin{array}{l}0.3\end{array}$

Multiplicative constants

The constants 0.85 and 0.7 related to $\begin{array}{l}E_{y}\end{array}$ and $\begin{array}{l}G_{xy}\end{array}$ account for missing gluing on narrow edges in the cross direction. Check [4] and [7]

Latin Hypercube Sampling (LHS)

Latin Hypercube Sampling (LHS) is a statistical sampling technique that is used to generate a set of random points that span a multi-dimensional data space. The basic idea behind LHS is to divide each dimension of the data space into equal intervals, and then randomly select one point from each interval such that each point is unique. This results in a set of points that are evenly distributed across the entire data space, avoiding clustering and ensuring that all regions of the data space are well represented. A visual representation of this technique, in the case of a 2D variable sampling, is shown in Figure 4. The strata shown on the right of Figure 4, which corresponds to the even data space can be mapped to a sample space (left of Figure 6), in which the relationship is an iso-probabilistic transformation.

Figure 6. Visualization of a 2-variable sampling procedure with LHS. Extracted from [1].

The rank measure can be replaced by dividing a uniform distribution $\begin{array}{l}\sim \mathcal{U}[0,1]\end{array}$ per each dimension and then using the Cumulative Density Function (CDF). The calculation is shown in (2). Suppose a random variable $\begin{array}{l}\mathcal{X}\end{array}$ is to be sampled and the function $\begin{array}{l}\mathcal{F}_{\mathcal{X}}\end{array}$ is the associated CDF of such variable, wherein its inverse can be computed. Then suppose $\begin{array}{l}\mathcal{U}\end{array}$ is an uniformely distributed random variable in $\begin{array}{l}[0,1]\end{array}$ . By appliying the inverse of $\begin{array}{l}\mathcal{F}_{\mathcal{X}}\end{array}$ , given that $\begin{array}{l}\mathcal{F}_{\mathcal{U}} (\mathcal{U})\end{array}$ is a known quantity (through LHS), then a sample of the random sample of $\begin{array}{l}\mathcal{X}\end{array}$ can be generated.

(2)	$\begin{array}{l}\displaystyle \mathcal{X} = \mathcal{F}^{-1}_{\mathcal{X}} \left ( \mathcal{F}_{\mathcal{U}} (\mathcal{U}) \right)\end{array}$

In general, LHS is widely used in a variety of fields, including computer experiments, simulation, and sensitivity analysis. It is particularly useful in these applications because it provides a more representative sample of the data space compared to simple random sampling. In the scope of Model Order Reduction, this method of sampling is simple to use and having large amounts of samples, the sample space could be covered [1]. Despite other techniques like Greedy Search or Local Sensitivity Methods could be, given certain MOR approaches, more effective, these are tailored by knowing a priori certain local MOR model or by estimating the local sensitivity via Taylor Approximation (which might be unstable and inaccurate), these type of sampling is used.

For this precise case, wherein all the variable space is composed of lognormal distributed variables, then an example implementation in Python is shown below. Furthermore, the pseudo-algorithm is also described.

Pseudo Algorithm

Compile a set of input data, including uncertain parameters and any other relevant information.
Select the desired number of samples.
Generate a Latin Hypercube design by randomly sorting the parameters into equal intervals.
Apply the isoprobabilistic transformation to create independent samples from the random parameters.

Sampling with LHS

# Import the libraries
from scipy.stats import qmc, lognorm

# Set an array to store means and standard deviations of the lognormal distributions
SIGMA_ARRAY = [...] # standard deviation
MEAN_ARRAY = [...] # means

# Set a number of samples
n_samples = 10  # arbitrary
# Get the size of the array of standard deviations (equal to array of means)
dimens_ = len(SIGMA_ARRAY) 

# Set a random seed (remember the algorithm to generate uniform distributed samples is a pseudo-random generator which requires a seed
np.random.seed(2023)

# Generate an object to set the number of dimensions to sample with LHS
sampler = qmc.LatinHypercube(d=dimens_)

# Generate the number 
sample = sampler.random(n=n_samples)

# Reshape array
sample = np.array(sample).reshape((n_samples,dimens_))

# Perform the isoprobabilistic transformation
re_sampled_array = lognorm(SIGMA_ARRAY, scale=MEAN_ARRAY).ppf(sample)

Pythonic Coupling with ANSYS APDL

The sampling process is complemented by generating the corresponding Finite Element Method given the material parameters. For this purpose, the model parameters and model generation shall work concomitantly, and therefore a unique procedure in Python was implemented. In order to have control of the generation of FEM models, the library PyANSYS was used [8]. From previous work with the sample problem, a macro was generated in MATLAB, wherein just one random combination of material parameters was generated and then a batch file was created. The batch file was executed from the command window and indicated a local computer to execute ANSYS MAPDL and indicated the set of commands, which were saved as text files and were edited as well from a MATLAB script. The procedure described is synthesized on the left side of Figure 7.

Figure 7. Synthesis of previous and implemented procedure for FEM Model Generation.

However, during the starting phase of the project, because of the issues with locating the root folder where the ANSYS MAPDL launcher was located, then the batch file didn't work properly as ANSYS couldn't be executed. Hence, a new idea was developed based on two premises:

Avoid the setting up of batch files, due to the fact that sometimes, depending on the administrator rights of a user, some errors were thrown, thus the issue with the location of the ANSYS launcher in its root folder was cumbersome.
Sample generation with the "MATLAB Approach" would require generating several text files, such that the batch file would have to be modified for every new sample generation and cannot be performed inline efficiently.

On these ideas, the development of a macro with Python framework was designed. By using PyANSYS, there are more advantages such as the control of the files being generated and the possibility to add checkpoints to the procedure such as a check mesh and mesh visualization to ensure the macro was running. Then the files could be generated afterward.

The PyANSYS works as an interface between the client requesting some info from the background ANSYS MAPDL runner working as a server. The connection is performed through a gRPC Channel which works two-way. Figure 8 shows a good synthetic representation of the architecture behind the PyMADL library and how it interacts with an ANSYS MAPDL Module.

Figure 8. Interaction Representation between Python Environment and ANSYS MAPDL Environment. Extracted from [8].

Usage of PyMAPDL

The gRPC channel works only with recent versions of ANSYS MAPDL. Normally with ANSYS 19.2 and more recent versions. Otherwise, the CORBA functionality shall be used, thus the capacity and operability of PyMAPDL are limited. Also, it is recommended work with Python 3.6 or higher.

In general, the library PyMAPDL is simple to use as the commands are similar to the ones of normal Ansys MAPDL. The following block shows a comparison between codes (the left corresponds to ActionScript commands to work with ANSYS MAPDL and the right corresponds to the Python Implementation) in which the purpose is the same. The set of lines instructs ANSYS MAPDL to run a local file called "Parameter_batch" which runs a macro that sets material parameters as environmental variables and then fills the values of the global material parameters of the model by using the values set before. Notice on the right that to call the same commands on the left there are two possibilities: 1) Use the mapdl.run such that the argument to the run operation is the exact command; 2) Some commands from MAPDL have a pythonic implementation as instances of the MAPDL instance. In the example below, such command is mp (which stands for Material Parameter). The PyMAPDL library attempts to have both kinds of syntax, which could be checked at [9].

/INPUT,'Parameter_batch','mac',,,0
mp,ex,1,ex
mp,ey,1,ey
mp,ez,1,ez
mp,nuxy,1,nu_xy
mp,nuxz,1,nu_xz
mp,nuyz,1,nu_yz
mp,gxy,1,gxy
mp,gyz,1,gyz
mp,gxz,1,gxz
mp,dens,1,rho
mp,dmpr,1,damp

# Import library
from ansys.mapdl.core import launch_mapdl

# Generate the MAPDL object
mapdl = launch_mapdl(loglevel="WARNING")

mapdl.run("/INPUT,'Parameter_batch','mac',,,0")
mapdl.mp("ex", 1, "ex")
mapdl.mp("ey", 1, "ey")
mapdl.mp("ez", 1, "ez")
mapdl.mp("nuxy", 1, "nu_xy")
mapdl.mp("nuxz", 1, "nu_xz")
mapdl.mp("nuyz", 1, "nu_yz")
mapdl.mp("gxy", 1, "gxy")
mapdl.mp("gyz", 1, "gyz")
mapdl.mp("gxz", 1, "gxz")
mapdl.mp("dens", 1, "rho")
mapdl.mp("dmpr", 1, "damp")

PyANSYS Adittional Contents

For more insight on the syntaxis and capabilities of PyANSYS, it is recommended to the reader check the library's website and the linked Github repository.

Sampling and Model Generation Coupling

The process for generating the samples implied the coupling of the aforementioned steps (LHS sampling and a design of a macro with PyANSYS). The simplified representation of the procedure is shown in Figure 9. In the code designated to generate samples, the inputs required to set are the number of samples, the type of macro to be used, the mesh size, and the frequency range of analysis. At the moment, the inputs shall be hardcoded into the functions.

Figure 9. Coupled Sampling-Model Generation Procedure

In the section of the flowchart where the PyANSYS macro fits, there were two different kinds of macros based on the corresponding analysis to be performed: harmonic analysis and modal analysis. The latter will not be discussed as it was designed as a benchmark to compare the capability of the PyANSYS code. Such macro, by setting the material parameters equal to the ones described in Table 1, generates the same first six modes of the system as shown in Appendix C of Schneider et. al. [7]. In this section, the macro saves the information to be recovered later as the material properties to set for the model, the actual samples in the sampling space (or the result of the LHS) and the frequency range used for the simulation. Finally, all the generated files (in this workflow as well as in the "Harmonic Analysis Macro") are then saved as a zipped folder, whose name corresponds to a random name previously defined. The random name is generated to have a number of characters, such that for a certain number of characters, any newly generated new has less probability to be repeated. The length of the string of each random name was set to 10.

On the other hand, the harmonic analysis macro generates the solutions of the DoFs $\begin{array}{l}\left[ u_{x}, u_{y}, u_{z}, \phi_{x}, \phi_{y}, \phi_{z} \right]^{T}\end{array}$ . at designated position vectors of the plate, as well as processing files, which comprise the mass and stiffness matrices, and the force vector (or right-hand-side). The macro can be broken down into the steps shown in Figure 10. The export of the matrices is accompanied by performing the harmonic analysis and exporting the solution at designated positions. These positions correspond to the ones marked by Schneider et. al. [4] and are shown in Figure 3. These solutions are used for assurance of the macro as these points are not used for the next steps. The reason behind this was to use the direct nodal displacements of the FEM model as this serves for further comparison with the MOR models implemented. Despite the direct nodal solutions of the harmonic analysis could be extracted given that the full database of solutions was saved in a file (i.e.Full ANSYS database), for round-off reasons, as the matrices were exported, the nodal solution of the problem was found by setting the full model in Python and using the implemented solvers of the library scipy (and numpy).

Degrees of Freedom

As the elements used for the mesh are SHELL281 of ANSYS, the actual Degrees of Freedom (DoFs) comprise the longitudinal displacements $\begin{array}{l}\left[ u_{x}, u_{y}, u_{z} \right]\end{array}$ and the rotations $\begin{array}{l}\left[ \phi_{x}, \phi_{y}, \phi_{z} \right]\end{array}$ .

Panel - Harmonic Analysis Macro

Figure 10. Flowchart of the macro used for "Harmonic Analysis".

About Matrix Formats and Node Mapping

ANSYS has a predefined function HBMAT, which exports one of the types of matrices received as parameters. The matrices are saved as sparse by using the Harwell-Boeing Format (for more information on this format, use this link). This function also enables to export of a file mentioned as Mapping in which there is a mapping of the Degrees of Freedom (Displacement Vector) of the dynamical system to the corresponding nodes of the mesh.

Importing matrices from ANSYS into Python Framework

Currently, scipy does not have the method to import a Harwell-Boeing (HB) formatted symmetric sparse matrix. For this reason, the macro saved the matrices with the MM (Matrix Market) format, for which scipy has the complete implementation. The matrices were saved by calling the function. Notice that the exported matrices, due to the FEM discretization shall be symmetric and when saving the matrix using the command mapdl.export which, when called assigns a variable with the scipy sparse matrix format, and then the matrices were saved using scipy's tailored library to export a sparse matrix with the MM format. Due to the fact, the matrices are saved twice (in MM and HB formats), then the sample generation is slow and requires large storage space.

Artificial Neural Networks and Additional Models

Introduction

An artificial neural network consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the data that is fed into the neural network. The hidden layers perform computations on the input data, with each neuron in a hidden layer taking in inputs from the previous layer, multiplying them by weights, adding a bias term, and applying an activation function to produce an output. The output layer produces the final output of the neural network. For illustrative purposes, let's consider the Neural Network shown in Figure 11. In this figure, it is shown two hidden layers. Their outputs are noted as $\begin{array}{l}\mathbf{x}^{(1)}\end{array}$ and $\begin{array}{l}\mathbf{x}^{(2)}\end{array}$ . Assuming then that a Neural Network can have an arbitrary number of layers, then the $\begin{array}{l}\mathbf{n}-\text{th}\end{array}$ layer of the Neural Network has an output noted as $\begin{array}{l}\mathbf{x}^{(n)}\end{array}$ .

Figure 11. Sample 2 hidden layer Feed Forward Neural Network (FNN). Extracted from [11]

During forward propagation, the input data is fed into the input layer, and the computations are performed through the hidden layers until the output layer produces a result. Let's analyze what happens layer by layer. There are sets of weights $\begin{array}{l}\mathbf{A}_{1}, \mathbf{A}_{2}, \cdots, \mathbf{A}_{n-1}, \mathbf{A}_{n}, \mathbf{A}_{n+1} , \cdots, \mathbf{A}_{N+1}\end{array}$ (in matrix form) that connect the output from the layer $\begin{array}{l}n-1\end{array}$ to the layer $\begin{array}{l}n\end{array}$ . Then the relationship between outputs of consecutive layers is:

(3)	$\begin{array}{l}\displaystyle \mathbf{x}^{(n+1)} = \mathbf{\sigma}^{(n+1)}\left(\mathbf{A}_{n+1}\mathbf{x}^{(n)} + b^{(n+1)} \right)\end{array}$

In (3) the term $\begin{array}{l}b_{n}\end{array}$ represents the bias at the $\begin{array}{l}\text{n-th}\end{array}$ layer and the function $\begin{array}{l}\sigma_{n+1}(\cdot)\end{array}$ is called the activation function. By the latter definition, the output can be defined as:

(4)	$\begin{array}{l}\displaystyle \mathbf{y} = \sigma^{(N+1)}\left(\mathbf{A}_{N+1}\mathbf{x}^{(N)} + b^{(N+1)} \right)\end{array}$

By using (3) and (4) recursively, it is shown that a Neural Network is a non-linear function which is represented as nested linear functions:

(5)	$\begin{array}{l}\displaystyle \mathbf{y} = \sigma^{(N+1)}\left(\mathbf{A}_{N+1} \sigma^{(N)} \left( \mathbf{A}_{N} \sigma^{(N-1)} \left( \mathbf{A}_{N-1} \sigma^{(N-2)}(\cdots ) +b_{N-1} \right) + b^{(N)} \right) + b^{(N+1)} \right)\end{array}$

(5) shows that the output has a dependency of all the set of weights as well as the set of inputs. Then it can be stated that $\begin{array}{l}\mathbf{y}=f \left(\mathbf{A}_{1}, \mathbf{A}_{2}, \cdots, \mathbf{A}_{n-1}, \mathbf{A}_{n}, \mathbf{A}_{n+1} , \cdots, \mathbf{A}_{N+1},\mathbf{x}^{(0)} \right)\end{array}$ . Analogous to the standard regression problem [11] [14], there should be an optimal set of weights $\begin{array}{l}\mathbf{A}_{1}^{*}, \mathbf{A}_{2}^{*}, \cdots, \mathbf{A}_{n-1}^{*}, \mathbf{A}_{n}^{*}, \mathbf{A}_{n+1}^{*} , \cdots, \mathbf{A}_{N+1}^{*}\end{array}$ for which certain error in the output is reduced. To achieve these optimal weights, the neural network shall be trained. For this, the output of the neural network is compared to the true output, and a loss function is used to calculate the difference between the predicted and true outputs. As with the activation functions, the loss function is also arbitrary, and are chosen depending on the type of problem for which the neural network is devised. For regression purposes, the most used, like in linear regression, is the mean square error (6).

About activation functions

$\begin{array}{l}\sigma (\cdot)\end{array}$ can be any function. However, in the literature, there are sets of functions that are commonly used for these applications. For more info, check [13] and [14] for definitions and typical functions as well as some motivations for why those functions are used in the context of Neural Networks.

About activation functions at the output layer

Although in most of Neural Network-related books, the output is just represented as: $\begin{array}{l}\mathbf{y} = \left(\mathbf{A}_{N+1}\mathbf{x}^{(N)} + b_{N} \right)\end{array}$ , which means that the function $\begin{array}{l}\sigma^{(N+1)} ( \cdot)\end{array}$ is the identity, this notation is adopted for special purposes within this report.

(6)	$\begin{array}{l}\displaystyle J = \frac{1}{2} \sum_{j=1}^{r} \left({y_{true}}_{j}-{y_{NN}}_{j} \right) ^{2}\end{array}$

The Backpropagation Algorithm is then used to adjust the weights and biases of the neurons in the network based on the calculated loss.

Let's suppose we have a feedforward neural network with $\begin{array}{l}N\end{array}$ layers. The backpropagation algorithm proceeds in two phases: the forward pass and the backward pass. In the forward pass, we compute the output of each layer given the input and the weights. In the backward pass, we compute the gradients of the loss with respect to the weights, starting from the output layer and moving back toward the input layer.

Let's start with the forward pass. For the $\begin{array}{l}n\end{array}$ -th layer of the network, we denote the input as $\begin{array}{l}\mathbf{x}^{(n-1)}\end{array}$ , the output as $\begin{array}{l}\mathbf{x}^{(n)}\end{array}$ , and the weights as $\begin{array}{l}\mathbf{A}_{n}\end{array}$ and the biases as $\begin{array}{l}b^{(n)}\end{array}$ . The output of the $\begin{array}{l}n\end{array}$ -th layer is computed as:

(7)	$\begin{array}{l}\displaystyle \mathbf{z}^{(n)} = \mathbf{A}_{(n)} \mathbf{x}^{(n-1)} + b^{(n)}\end{array}$

(8)	$\begin{array}{l}\displaystyle \mathbf{x}^{(n)} = \sigma^{(n)}(\mathbf{z}^{(n)})\end{array}$

The output layer, in fact, can be computed as

(9)	$\begin{array}{l}\displaystyle \mathbf{y} = \sigma^{(N+1)}(\mathbf{z}^{(N)})\end{array}$

Next, we move on to the backward pass. We want to compute the gradients of the loss function with respect to the weights in each layer, starting from the output layer and moving back toward the input layer. We use the chain rule to propagate the gradients backward. For the final layer, the gradient of the loss with respect to the output is:

(10)	$\begin{array}{l}\displaystyle \frac{\partial J}{\partial \mathbf{y}}\end{array}$

The gradient of the loss with respect to the input of layer $\begin{array}{l}n\end{array}$ is computed as:

(11)	$\begin{array}{l}\displaystyle \frac{\partial J}{\partial \mathbf{x}^{(n)}} = (\mathbf{A}_{n+1})^T \frac{\partial J}{\partial \mathbf{x}^{(n+1)}} * \sigma'^{(n)}(\mathbf{z}^{(n)})\end{array}$

where $\begin{array}{l}*\end{array}$ denotes element-wise multiplication, and $\begin{array}{l}\sigma'^{(n)}(\cdot)\end{array}$ is the derivative of the activation function for the $\begin{array}{l}n\end{array}$ -th layer. From (11) it is shown that the Backpropagation algorithm works as a cascade o derivatives, as the error is propagating from the end layers to the initial layers of the network.

(12)	$\begin{array}{l}\displaystyle \frac{\partial J}{\partial \mathbf{A}_{n}} = \frac{\partial J}{\partial \mathbf{z}^{(n)}} {\mathbf{x}^{(n-1)}}^{T}\end{array}$

(13)	$\begin{array}{l}\displaystyle \frac{\partial J}{\partial \mathbf{x}^{(n)}} = \frac{\partial J}{\partial \mathbf{z}^{(n)}} \frac{\partial \mathbf{z}^{(n)}}{\partial \mathbf{x}^{(n)}}=\frac{\partial J}{\partial \mathbf{z}^{(n)}}\mathbf{A}_{n}\end{array}$

(14)	$\begin{array}{l}\displaystyle \frac{\partial J}{\partial \mathbf{z}^{(n)}} = \frac{\partial J}{\partial \mathbf{x}^{(n)}} \frac{\partial \mathbf{x}^{(n)}}{\partial \mathbf{z}^{(n)}}=\frac{\partial J}{\partial \mathbf{x}^{(n)}}\frac{\partial \sigma^{(n)}}{\partial \mathbf{z^{(n)}}}\end{array}$

The Backpropagation algorithm is useful to determine the gradient of the error with respect to all the weights and biases of the network (i.e. $\begin{array}{l}\frac{\partial J}{\partial \mathbf{A}_{n}}\end{array}$ ), which help to the weight update process. This is achieved by coupling (11) with (12), (13) and (14). The weight update is performed bu assuming a Gradient Descent update. Suppose that the weights encapsulated in all the matrices $\begin{array}{l}\mathbf{A}_{n}\end{array}$ , as well as the biases $\begin{array}{l}b^{(n)}\end{array}$ are represented as a column vector $\begin{array}{l}\mathbf{w}_{k}\end{array}$ which is the state of the weights at some iteration $\begin{array}{l}k\end{array}$ . Then the weights are updated as follows:

(15)	$\begin{array}{l}\displaystyle \mathbf{w}_{k+1}=\mathbf{w}_{k} + \delta \nabla J\end{array}$

In this equation $\begin{array}{l}\nabla J\end{array}$ corresponds to the gradient of the error function with respect to all the weights and biases and $\begin{array}{l}\delta\end{array}$ is a hyperparameter known as "learning rate" [13]. This process is performed several times until convergence. Per every step the intention is to point to the gradient $\begin{array}{l}\nabla J\end{array}$ , wherein the loss function $\begin{array}{l}J\end{array}$ goes to a minimum. This turns out to be an optimization problem that can be untractable depending on the size of the network [11] and there is a high chance that the optimization algorithm lies on local minima due to the non-convexity of the loss function [13]. Then the coupling of different optimizers with different learning rates $\begin{array}{l}\delta\end{array}$ may result in different trained networks.

Applied Models

Approach 1: Pure Data Based Neural Network

In this method, a Neural Network is devised to predict the frequency based magnitude of the displacements $\begin{array}{l}||\hat{u}_{z}(x,y,\omega)||\end{array}$ in $\begin{array}{l}z\end{array}$ -direction wherein the variables $\begin{array}{l}x\end{array}$ and $\begin{array}{l}y\end{array}$ represent the position on the plate and $\begin{array}{l}\omega\end{array}$ the frequency of excitation. Contrary to the notation used for the harmonic analysis, in which the displacement is a vector, the magnitude of the displacement is represented as a scalar to have a network with the least number of outputs (1 output in this case). However, this implied having the position of the plate as input. Due to its simplicity, this approach was adopted as an exploratory step to check if a neural network could "capture" the parametric dependency of the dynamical system and work as a surrogate.

The mathematical representation of the Neural Network of this approach as a function is the following:

(16)	$\begin{array}{l}\displaystyle f_{NN,\text{Approach 1}} \left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta ,x ,y , \omega \right) = -\log_{10} \left(\|\|\hat{u}_{z}\|\| \right)\end{array}$

The selection of output was chosen as the negative base-10 logarithm due to the fact that for certain frequencies of excitation, the displacement is extremely small and close to zero. Despite the magnitude of a complex number shall be positively valued, if the output was set as just the magnitude of the displacement, the network could predict negative values. Due to the fact that with that transformation shown in (16) all the outputs were bounded in the range of $\begin{array}{l}[-3,8]\end{array}$ , then, by applying the exponential function to this output, the magnitude of the displacement will be always positively valued.

As for the inputs, due to heuristic considerations, these were normalized, as recommended in [14]. In the case of the material parameters $\begin{array}{l}\left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta \right)\end{array}$ , those were normalized by using their corresponding sampling value in the LHS grid space, which is very advantageous as these values are bounded by $\begin{array}{l}[0,1]\end{array}$ . Thereas the spatial inputs $\begin{array}{l}(x,y)\end{array}$ were normalized by dividing the corresponding value by the length and breadth of the plate, respectively. For the frequency, the sweep was performed from $\begin{array}{l}0\end{array}$ to $\begin{array}{l}150 \text{ [Hz]}\end{array}$ , so to set a value between $\begin{array}{l}[0,1]\end{array}$ , all the values of the frequency were normalized by dividing all the frequencies values swept by $\begin{array}{l}150 \text{ [Hz]}\end{array}$ .

This approach also required a special way to organize the data. The way the databases were generated to work with this approach is shown in Figure 12.

Figure 12. Flowchart of procedure to generate databases to work with Approach 1.

The management of databases was split into two parts: the first database consists of a database that stores the material parameters of each model; the second one stores the variables of interest, which are the positions, frequencies, and magnitudes of displacements. This split was performed to reduce the global storage requirement of the data as some of the data might be repeated or redundant. However, when managing the data to be fed into the Neural Networks, both databases were merged by using the folder identifier as the primary key of both folders. Then the big database was generated temporarily in memory to be partitioned (into training, validation, and testing sets) and modified.

About the size of databases generated for Approach 1

The size of the databases generated for Approach #1 can be very large in the order of ~1 GB. Therefore care should be taken for this. Notably, the most popular library of Python to deal with databases is Pandas. Yet, the number of rows of Database 2 shown in Figure 12 is in the order of $\begin{array}{l}10^{6}\end{array}$ , which may cause complications as Pandas, for sake of speed, is just designed to load all the entries of a database in memory. For this reason, instead of Pandas, the library Dask was used (a good blog comparing the libraries is shown here). Dask avoids loading all the entries of a database into memory and works in a parallel way such that just chunks of all the entries of the database are progressively loaded into memory, so processing the databases and performing the merging procedure was used by using Dask.

In addition, due to the large size, the databases were saved in the parquet format, which is a binary based format, which is efficient for both saving and loading the databases.

Approach 2: Global Basis Reduction coupled with Neural Networks

For this approach, a definition of what a Global Basis is used. Notably, for simple MOR approaches (or without parametric dependency), the simplest reduction consists of performing the Proper Orthogonal Decomposition of a series of snapshots of a dynamical system, from which the modes of the system are identified. Then the reduction is performed representing the system by performing a combination of the most relevant modes to represent the system [11].

In the context of pMOR, this approach is also used by using a snapshot matrix, wherein snapshots with time (or frequency dependence) are stacked as well with different parameters. In this context, let $\begin{array}{l}\mathbf{\hat{x}}(\mathbf{\mu}^{(l)}},\omega)\end{array}$ be a column of the snapshot matrix $\begin{array}{l}\mathbf{X}\end{array}$ , where $\begin{array}{l}\mathbf{\mu}^{(l)}\end{array}$ corresponds to a combination of parameters of the model and $\begin{array}{l}\omega\end{array}$ an arbitrary frequency. Then, the snapshot matrix collects different snapshots for different combinations of frequencies and parameter combinations. According to Benner et. al. [1] This method has the drawback that it is not optimized with respect to the error induced by reducing the system, yet it is easy to implement due to the fact that the way the reduction is performed is non-intrusive or purely data based. The global basis, by this method, loses the parametric dependency and attempts to represent the "general modes of the system" [10] [16].

The mathematical basis of this reduction method is the Galerkin Projection, which states that the state variables of a dynamic system can be decomposed as a linear combination of modes [11].

(17)	$\begin{array}{l}\displaystyle \hat{\mathbf{x}} \left(\mathbf{\mu},\omega \right)= \sum_{l=1}^{N} \mathbf{\psi}_{l} a_{l}(\mathbf{\mu},\omega)\end{array}$

In (17), the vectors $\begin{array}{l}\mathbf{\psi}_{l}\end{array}$ correspond to the basis and encompass the spatial dependency of the system, whereas the coefficients $\begin{array}{l}a_{l}\end{array}$ are varied and represent the variation of the modes with respect to the parameters and frequency. $\begin{array}{l}N\end{array}$ are the number of degrees of freedom of the full system. With this idea, a reduced order approximation can be found by selecting a subset of $\begin{array}{l}r\end{array}$ modes, where $\begin{array}{l}r< < N\end{array}$ as follows:

(18)	$\begin{array}{l}\displaystyle \hat{\mathbf{x}} \left(\mathbf{\mu},\omega \right) \approx \sum_{l=1}^{r} \mathbf{\psi}_{l} a_{l}(\mathbf{\mu},\omega)\end{array}$

As a first step to get the global reduced basis, suppose that the snapshot matrix can be decomposed by using a Singular Value Decomposition (SVD):

(19)	$\begin{array}{l}\displaystyle \mathbf{X} = \mathbf{V} \mathbf{\Sigma} \mathbf{N}^{H}}\end{array}$

$\begin{array}{l}\mathbf{V}\end{array}$ and $\begin{array}{l}\mathbf{N}\end{array}$ represent orthonormal matrices. The columns of $\begin{array}{l}\mathbf{V}\end{array}$ and $\begin{array}{l}\mathbf{N}\end{array}$ are called left and right singular vectors of $\begin{array}{l}\mathbf{X}\end{array}$ and $\begin{array}{l}\mathbf{\Sigma}\end{array}$ is a diagonal matrix whose elements are called singular values.

$\begin{array}{l}\mathbf{\Sigma} = \begin{pmatrix} \sigma_{1} & 0 & 0 & 0 & \hdots & 0 \\ 0 & \sigma_{2} & 0 & 0 & \hdots & 0 \\ 0 & 0 & \sigma_{3} & 0 & \hdots & 0 \\ 0 & 0 & 0 & \sigma_{4} & \hdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & \hdots & \sigma_{N} \end{pmatrix}\end{array}$

Considering that the singular values are ordered as $\begin{array}{l}\sigma_{1} > \sigma_{2} > \cdots > \sigma_{N}\end{array}$ , an considering that the columns of $\begin{array}{l}\mathbf{V}\end{array}$ are linked to a singular value, then, a good approximation to a reduced basis are the $\begin{array}{l}r\end{array}$ first columns of $\begin{array}{l}\mathbf{V}\end{array}$ , which will be called $\begin{array}{l}\mathbf{\phi_{l}}\end{array}$ . Notably:

$\begin{array}{l}\displaystyle \mathbf{\phi}_{l}}\approx \mathbf{\psi}_{l} \ \forall l \leq r\end{array}$

The value of $\begin{array}{l}r\end{array}$ is arbitrary, yet it can be defined via the singular values. The singular values could be interpreted as "weights" of the importance of corresponding singular vectors. The sum of the squares of the singular values expresses the variance induced when reconstructing a full model from a reduced one [1]. A guiding inequality is the following:

(20)	$\begin{array}{l}\displaystyle \kappa \leq \frac{\sum_{i}^{r} \sigma_{i}^{2}}{\sum_{i}^{N} \sigma_{i}^{2}}\end{array}$

In (20), the value of $\begin{array}{l}\kappa\end{array}$ is "user defined", thus by iterating over possible values of $\begin{array}{l}r\end{array}$ for which the inequality holds true, then this value could be defined.

Now, an actual projection matrix can be built from the basis vectors $\begin{array}{l}\mathbf{\phi}_{l}}\end{array}$ . Let $\begin{array}{l}\mathbf{\Phi} = \left[ \mathbf{\phi}_1, \mathbf{\phi}_2, \cdots , \mathbf{\phi}_r \right]\end{array}$ , then the projection matrix $\begin{array}{l}\mathbf{P}\end{array}$ can be built as:

(21)	$\begin{array}{l}\displaystyle \mathbf{P} = \mathbf{\Phi}^{H}\end{array}$

with

(22)	$\begin{array}{l}\displaystyle \hat{\mathbf{x}}_{\text{Full}} = \mathbf{P} \hat{\mathbf{x}}_{\text{RB}}\end{array}$

At this moment, it is possible to use the projection matrix to use with the dynamical system. However, and most interestingly, samples of $\begin{array}{l}\hat{\mathbf{x}}_{\text{RB}}\end{array}$ can be generated by varying the parameters, and then using surrogate models, such as a Neural Network, to predict new outcomes. Particularly, for this second approach, a neural network is used to predict reduced space variables.

(23)	$\begin{array}{l}\displaystyle f_{NN,\text{Approach 2}} \left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta, \omega \right) = {\mathbf{u}}_{\text{RB}}\end{array}$

The vector $\begin{array}{l}\hat{\mathbf{u}}_{\text{RB}}\end{array}$ corresponds to a reduced subspace of $\begin{array}{l}|| \hat{\mathbf{u}}_{\text{Full space}}||\end{array}$ , which is the vector representing the magnitude of the displacements of the full model. This approach was used based on the Statistical Interpretation of outputs of a Neural Network done by Bishop [13]. In this case, if the real and imaginary parts of the displacements were to be represented by a neural network, some errors might appear due to the fact that, in most cases, the real and imaginary parts of a number are not necessarily correlated.

The notation $\begin{array}{l}\hat{\mathbf{u}}_{\text{RB}}\end{array}$ is used without $\begin{array}{l}||\cdot||\end{array}$ due to the fact that some values of $\begin{array}{l}\hat{\mathbf{u}}_{\text{RB}}\end{array}$ can be negatively valued, despite all the values of the Full Space being positive. Future works with this problem might consider using other surrogates, which could handle negative values. One promising and incipient research field is neural networks with complex valued weights. Despite there being some progress in such area, this approach was not used because most common Neural Networks toolboxes such as Torch and TensorFlow work only with real valued weights, so a new toolbox from scratch had to be generated.

The workflow can be summarized as follows:

Offline Phase

Generate high-fidelity simulations. In this case the CLT-plate displacements with different material parameters and varying frequencies. (Done from Approach 1)
Construct the snapshot matrix $\begin{array}{l}\mathbf{X}\end{array}$
Perform the SVD on $\begin{array}{l}\mathbf{X}\end{array}$
Set $\begin{array}{l}r\end{array}$ based on (20)
Build the reduced basis $\begin{array}{l}\mathbf{\Phi} = \left[ \mathbf{\phi}_1, \mathbf{\phi}_2, \cdots , \mathbf{\phi}_r \right]\end{array}$

Compute samples of

$\begin{array}{l}\hat{\mathbf{u}}_{\text{RB}}\end{array}$ from high fidelity simulation results by using (22). Arrange the samples of

$\begin{array}{l}\hat{\mathbf{u}}_{\text{RB}}\end{array}$ in a database table as:

	Folder Identifier	$\begin{array}{l}\omega\end{array}$	$\begin{array}{l}\hat{u}_{\text{RB}_{1}\end{array}$	$\begin{array}{l}\hat{u}_{\text{RB}_{2}\end{array}$	$\begin{array}{l}...\end{array}$	$\begin{array}{l}\hat{u}_{\text{RB}_{r}\end{array}$
1	$\begin{array}{l}...\end{array}$	$\begin{array}{l}...\end{array}$	$\begin{array}{l}...\end{array}$	$\begin{array}{l}...\end{array}$	$\begin{array}{l}...\end{array}$	$\begin{array}{l}...\end{array}$

This is a similar table to the database table from Database Table 2 shown in Figure 12. In this case, as there are more outputs that should be written on the table, yet with this approach, the dependency on the position of the plate as a direct input vanishes.

Train the neural network $\begin{array}{l}f_{NN,\text{Approach 2}} \left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta, \omega \right)\end{array}$ with subsets of all the generated samples. The inputs are normalized the same way as stated in Approach 1.

Online Phase

Evaluate the output $\begin{array}{l}f_{NN,\text{Approach 2}} \left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta, \omega \right)\end{array}$ of the network for the input vector $\begin{array}{l}[E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta, \omega]^{T}\end{array}$ .
Compute the full space solution by $\begin{array}{l}||\hat{\mathbf{u}}_{\text{Full}}|| = \mathbf{P} \hat{\mathbf{u}}_{\text{RB}}\end{array}$

Approach 3: Learning the Cholesky Decomposition of Matrices

The Cholesky decomposition is a factorization of a Hermitian positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which can be represented as:

(24)	$\begin{array}{l}\displaystyle \mathbf{A} = \mathbf{L} \mathbf{L}^{H}\end{array}$

where $\begin{array}{l}\mathbf{A}\end{array}$ is the original matrix, $\begin{array}{l}\mathbf{L}\end{array}$ is the lower triangular matrix, and $\begin{array}{l}\mathbf{L}^H\end{array}$ is the conjugate transpose of $\begin{array}{l}\mathbf{L}\end{array}$ (if $\begin{array}{l}\mathbf{A}\end{array}$ is real valued, then $\begin{array}{l}\mathbf{L}^{H}=\mathbf{L}^{T}\end{array}$ ).

From Finite Element Method Discretizations of mechanical continuum models, it's seen that the matrices generated are depending on the material properties (i.e. Elasticity tensor and mass density). Then, by having a neural network, which receives the material parameters as inputs and is trained to get the $\begin{array}{l}\mathbf{L}\end{array}$ of that particular problem, given that the size of matrix $\begin{array}{l}\mathbf{A}\end{array}$ is fixed, can perform such prediction. Mathematically, this approach can be described as:

(25)	$\begin{array}{l}\displaystyle f_{NN,\text{Approach 3}} \left( E_{x}, E_{y},G_{xy},G_{zx},G_{yz},\eta \right) = [l_{1,1},l_{2,1},l_{2,2},\cdots,l_{r,r-1},l_{r,r}]^{T}\end{array}$

In this case, the number of outputs corresponds to the formula $\begin{array}{l}\frac{N(N+1)}{2}\end{array}$ , where $\begin{array}{l}N\end{array}$ corresponds to the number of rows/columns of $\begin{array}{l}A\end{array}$ .

One approach is shown in [12], in which the prediction of $\begin{array}{l}6\times 6\end{array}$ stiffness matrices for simple FEM models was performed. In such a study, the material parameters were used as inputs to the neural network and the authors made use of the symmetries depending on the type of material in order to enforce additional conditions on the neural network and provide better predictions.

This approach can be used in a general sense for any dynamical model wherein the matrices are positive definite and symmetrical. Yet, in the context of pMOR, the question lies in coupling an intrusive pMOR method to have reduced matrices and then using a neural network to predict the optimal and interpolated reduced matrices, which can be used to perform simulation runs with a reduced basis state space variables. The intrusive pMOR model should be used with care, as some of these have issues with preserving positive definiteness. The Matrix Interpolation approach, as described in Möhring et. al. [15] uses an algorithm that, in general, preserves the positive definiteness of full matrices.

SOAR Algorithm

The Second-Order Arnoldi procedure is used to compute the orthonormal basis of the second-order Krylov subspace. The pseudo-code is as follows.

$\begin{array}{l}Inputs: M,D,K,n,r_0\\ Output: Q_n\\ \\ q1 = r0 / ||r0||\\ w = 0\\ \\ For j = 1,2,…,n\, do\\ \quad r = -K^-^1 (Dq_j + Mw)\\ \quad h_j = Q_j^T r\\ \quad r = r – Q_j h_j\\ \quad h_(_j_+_1_,_j_) = ||r||_2\\ \quad stop \, if h_j+1_i_,_j = 0,\\ \quad q_(_j_+_1_) = r/h_(_j_+_1_,_j)\\ \quad solve H_j (2: j +1,1:j)g = e_j\, for \,g\\ \quad w = Q_j g \\ end for\end{array}$

The function takes starting vector $\begin{array}{l}r_0\end{array}$ , reduced order $\begin{array}{l}n\end{array}$ mass matrix $\begin{array}{l}M\end{array}$ , damping matrix $\begin{array}{l}D\end{array}$ , and stiffness matrix $\begin{array}{l}K\end{array}$ as input. The goal of this procedure is to obtain the projection matrix $\begin{array}{l}Q_n\end{array}$ which would be used to reduce the order of the full model.

The complexity of this procedure,

Memory → (n + 2)N
Flops (floating point operations per second) → (3/2)Nn(n + 4/3)

Here we assume that only the Arnoldi vectors are stored and there are no zero columns in $\begin{array}{l}Q_n\end{array}$ .

Structure-Preserving Dimension Reduction

In this part, we will present a structure-preserving dimension reduction method based on the SOAR procedure.

The main interest in this algorithm is to approximate the Full model around a pre-selected expansion point $\begin{array}{l}s \neq 0\end{array}$ . Now, we can rewrite the transfer function as follows:

$\begin{array}{l}h(s) = l^T (s^2 M + sD + K)^-^1 f = l^T ((s-s_0)^2 \widetilde{M} + (s-s_0) \widetilde{D} + \widetilde{K})^-^1 f\end{array}$

Here, $\begin{array}{l}\widetilde{D}\end{array}$ and $\begin{array}{l}\widetilde{K}\end{array}$ can be expressed as,

$\begin{array}{l}\widetilde{D} = 2s_0 M + D \quad and \quad \widetilde{K} = s_0 ^2 M + s_0 D + K\end{array}$

$\begin{array}{l}Inputs: M,D,K,n,s_0\\ Output: Q_n\\ \\ r_0 = \widetilde{K}^-^1 f \\ q1 = r_0 / ||r_0||\\ w = 0\\ \widetilde{D} = 2s_0 M + D \\ \widetilde{K} = 2s_0 ^2 M + s_0 D + K \\ \\ For j = 1,2,…,n \, do\\ \quad r = -\widetilde{K}^-^1 (\widetilde{D}q_j + Mw)\\ \quad h_j = Q_j^T r\\ \quad r = r – Q_j h_j\\ \quad h_(_j_+_1_,_j_) = ||r||_2\\ \quad stop \, if h_j+1_i_,_j = 0,\\ \quad q_(_j_+_1_) = r/h_(_j_+_1_,_j_)\\ \quad solve H_j (2: j +1,1:j)g = e_j \, for\, g\\ \quad w = Q_j g \\ end for\end{array}$

The pseudo-code of the structure-preserving dimension reduction algorithm is given above. After achieving the projection matrix, the reduction is done by multiplying the full model matrices with the transpose and the normal form of the projection matrix.

$\begin{array}{l}M_n = Q_n ^T * M * Q_n \quad D_n = Q_n ^T * D * Q_n \quad K_n = Q_n ^T * K * Q_n \quad \\ \quad \quad \quad \quad \quad \quad \quad l_n = Q_n ^T * l \quad f_n = Q_n ^T * f\end{array}$

Matrix interpolation

The main goal here is to find one interpolation function for all matrix entries.

In this project, we have different parameter values for our samples. When we change a parameter, some deformation modes will switch positions. When making a calculation, it does not make sense to interpolate between different deformation modes. So, we have to find a common basis, to which we can transform all individual bases.

To find a common basis, all of the basis obtained by samples that we have is put next to each other in an array.

$\begin{array}{l}V_a_l_l = [V_1, V_2 ... V_k]\end{array}$

After putting all the basis next to each other, we apply a singular value decomposition.

$\begin{array}{l}V_a_l_l = U\Sigma N^T\end{array}$

Here, the most significant modes are the first r columns in U, where r is the size of the reduced model.

After achieving the most significant modes, we have to transform the reduced bases of the individual samples into a common coordinate system.

$\begin{array}{l}T_i = R^T V_i\end{array}$

$\begin{array}{l}L_i = (V_i ^T R)^-^1\end{array}$

Now, we can transform our individual reduced matrices into a common coordinate system:

$\begin{array}{l}M_r_,_i ^* = L_i M_r_,_i T_i ^-^1\end{array}$

Inconsistencies

By computing the angle between the basis vectors, we can judge if the transformation of the individual reduced bases to the common basis was successful. The transformed reduced basis should be as similar to R as possible. Therefore, we can measure the angle between the individual transformed basis vectors and the basis vectors in R with the following formula:

A threshold for the angle is still to be found. However, here the threshold is selected as 45 degrees.

Results

Approaches 1 and 2 with Neural Networks

Table 2. Summary of best configurations of networks used for Approaches 1 and 2.

Model	Learning rate $\begin{array}{l}\delta\end{array}$	Hidden Layer Configuration	Activation Function	Optimizer	Loss Function	Number of training epochs	Number of outputs
Approach 1	0.05	$\begin{array}{l}[40,40,40,40]\end{array}$	ReLU	ADAM	MSE	25	1
Approach 2	0.3	$\begin{array}{l}[50]\end{array}$	ReLU	ADAM	MSE + (reg*)	5000	7

For approaches 1 and 2 two different networks were generated. The parameters such as hidden layers and learning rate, shown in Table 2, were found through cross-validation procedures. The number of training epochs is also different due to the size of the samples. The main downside of approach 1 is a large number of samples, which requires also an increased demand on memory. Then as many samples are seen by the network, then fewer training epochs are required and the training history on validation data is convergent after the fifth epoch. On the other hand, for approach 2, as a reduced basis shall be determined, by using (20) ,it was found that for 99% of the variance, 7 modes were enough. For both approaches, the ReLU Activation function yielded better results.

Figure 13. Comparison of results for Approaches 1 and 2. On the right, there are results that use the Global Basis Approach (Approach #2) and include some regularization.

Given the best-performing configurations, Figure 13 (left) shows a comparison of the Approaches with the High Fidelity FEM result by taking as reference the displacement of one extremal node. Visually, both models show a good performance for small frequencies, yet for frequencies more than 70 Hz, the performance is poor and the plots show great divergence from the results calculated with the FEM Model. From the numerical perspective, this might be because of the minuscule displacements at the end of the frequency sweep. Yet, it can be observed, that Approaches 1 and 2 can predict the frequencies where peaks happen. Another important observation is that Approach 2 displays a wiggly/oscillatory behavior. This is due to the instability of the model as such plot corresponds to the absolute value of the projection onto the Full space, as the projection yielded some negative values, which, physically are unfeasible for the model.

For this last reason, a regularization function was added to the loss function of the network of Approach #2 (in a PINN approach [14] in order to penalize those weights which generated negative values after projection). The loss function in such case was:

(26)	$\begin{array}{l}\displaystyle J = \frac{1}{2} \sum_{j=1}^{r} \left({y_{true}}_{j}-{y_{NN}}_{j} \right) ^{2} + \lambda \text{ReLU} \left(-\mathbf{P} f_{NN,\text{Approach 2}} \right)\end{array}$

In (26), the loss function combines the typical mean square error with another function controlled by a parameter . $\begin{array}{l}\lambda\end{array}$ . This loss function was implemented for Approach 2, and a sample result is shown at the right of Figure 13. It's shown that the regularization generates more issues as the predicted displacement magnitude is greatly averaged, yet the appearance of the peak is kept.

Model Order Reduction with SOAR Algorithm

First Approach

In our first approach to the problem, we are taking reduced order as $\begin{array}{l}n = 16\end{array}$ The expansion point selected for the SOAR algorithm is $\begin{array}{l}s_0 = 80\end{array}$ . The plot is between $\begin{array}{l}0-150 Hz\end{array}$ . Lastly, the deflation tolerance of the SOAR algorithm is set to $\begin{array}{l}10^-^6\end{array}$ .

The first plot is the plot of the transfer functions. The formula used for calculation:

$\begin{array}{l}h_n(s) = l_n ^T(s^2 M_n + s D_n + K_n)^-^1 f_n\end{array}$

Figure 14: Transfer Function plot over frequency

The second plot is the accelerance plot. The formula used for accelerance calculation:

$\begin{array}{l}a_n(s) = l_n ^T(s^2 M_n + s D_n + K_n)^-^1 f_n * s^2\end{array}$

Figure 15: Accelerance plot over frequency

Lastly, the relative error plot:

Figure 16: Relative Error plot over frequency

As can be seen in Figure 15, the true error of the reduced model is in an acceptable range. 150 frequency sample points have been used while calculating the transfer function and accelerance.

After the reduction process, we continue with the matrix interpolation. As we talked about before, in this step, we can find inconsistencies in the common basis. The angles can be seen in table [2].

Column of Basis Vector	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
Angle achieved in Interpolation	$\begin{array}{l}4.4^\circ\end{array}$	$\begin{array}{l}1.81^\circ\end{array}$	$\begin{array}{l}1.08^\circ\end{array}$	$\begin{array}{l}0.27^\circ\end{array}$	$\begin{array}{l}0.68^\circ\end{array}$	$\begin{array}{l}0.59^\circ\end{array}$	$\begin{array}{l}0.96^\circ\end{array}$	$\begin{array}{l}2.98^\circ\end{array}$	$\begin{array}{l}5.93^\circ\end{array}$	$\begin{array}{l}16.57^\circ\end{array}$	$\begin{array}{l}65.72^\circ\end{array}$	$\begin{array}{l}77.56\end{array}$	$\begin{array}{l}67.16^\circ\end{array}$	$\begin{array}{l}72.97^\circ\end{array}$	$\begin{array}{l}84.5^\circ\end{array}$	$\begin{array}{l}80.36^\circ\end{array}$

Table 3: Comparison between basis vectors

It was stated before that the degrees above a certain threshold can be deleted. However, in practice, we achieved the best result by deleting the last 4 columns in the basis vectors. After the Cholesky decomposition, we can plot the reduced transfer function by using the reduced matrices obtained in the common coordinate system. As shown in figure 17, we still achieve a very similar approximation to the full model after the matrix interpolation.

Figure 17: Accelerance plot over frequency, Matrix interpolation included

However, even after deleting the inconsistencies in the common basis, we still had problems in training the Neural network because the order was still high and we had not enough samples.

Second Approach

As stated in the part before, the training results of the NN did not converge well. So, we had to look for further ways to reduce the order of the model. Our approach here is pretty similar to the approach we had before but the new reduced order is selected as $\begin{array}{l}n = 7\end{array}$ . After the implementation, we achieved a good approximation of the full model between the $\begin{array}{l}0-75 Hz\end{array}$ range. Then we continued with the matrix interpolation. In the matrix interpolation step, the hope was to reduce the order more by finding inconsistencies. However, as shown in table 3, all of the angles are below 45 degrees, thus the reduced order of the system stayed at seven.

Figure 18: Transfer Function plot over frequency

Figure 19: Accelerance plot over frequency

Figure 20: Relative error plot over frequency

Neural Network Approach 3

The network design used for this approach were taken from Jekel et al. [12]. In contrast to what these authors developed, the Loss function was not performed on the normal stiffness (or any kind of matrix) but on the Cholesky Decomposition terms of the matrix. However, these authors coupled a "feedback check" in order to keep the prediction of the Neural Network to be positive definite. The idea the authors used was to add a small number (in the example $\begin{array}{l}10^{-8}\end{array}$ ) to the terms of the main diagonal of the Cholesky Decomposition. In Table 4, the properties of the best configuration of the network are shown.

Table 4. Summary of best configurations of networks for Approach 3

Model	Learning rate $\begin{array}{l}\delta\end{array}$	Hidden Layer Configuration	Activation Function	Optimizer	Loss Function	Number of training epochs	Number of outputs
Approach 3	0.03	$\begin{array}{l}[42]\end{array}$	Softmax	ADAM	MSE	25000	21 $\begin{array}{l}(7 \times 7)\end{array}$ matrix

After achieving the lower triangular matrices for Mass, Stiffness, Damping matrices, and Force vector from the NN. A mean case has been generated to test the prediction. Below is the plot of the Full model, SOAR, and NN Prediction of the mean case.

Figure 21: Transfer Function plot over frequency, NN model included

It can be stated that the result that we got from the NN is accurate in approximating the SOAR and Matrix interpolation steps. This is the case where we used the mean values of the input parameters.

The next question we have is what would happen if we select input parameters out of the input parameter range that we used to train our NN? For this approach, we generated some samples, and the plots are shown below.

Figure 22: Transfer Function plot over frequency, NN model included

Figure 23: Transfer Function plot over frequency, NN model included

Lastly, the same model plotted between $\begin{array}{l}0-150 Hz\end{array}$

Figure 24: Transfer Function plot over frequency, NN model included

Conclusions

Neural Networks as surrogate models work well for the type of problem mentioned in this report, but “traditional” pMOR strategies can guarantee fewer errors. Coupling pMOR with Neural Networks helps to reduce the computational time involved in the training of the networks. Second Order Arnoldi algorithm was able to make the reduced model work more than 20 times faster than the full model. Matrix interpolation was able to find a common basis for the most significant modes. Cholesky Decomposition of the interpolated matrices assured positive definiteness of the reduced system.

Future Work

There are many other complex Neural Network architectures available that can be implemented with Reduced Order Models such as ROM-net, Physics informed neural networks (PINN), Physics-reinforced neural networks (PRNN), Runge Kutta Neural Networks, and many others. Support Vector Machine (SVM) (a supervised learning model) can be implemented to analyze the data and classification process. Other than Second Order Arnoldi Algorithm (SOAR), other ROM algorithms are available for interpolation which can be applied to get the reduced matrices.

References

P. Benner, S. Gugercin, and K. Willcox, ‘A survey of projection-based model reduction methods for parametric dynamical systems’, SIAM Review, vol. 57, pp. 483–531, 2015.
Daniel, T., Casenave, F., Akkari, N. et al. Model order reduction assisted by deep neural networks (ROM-net). Adv. Model. and Simul. in Eng. Sci. 7, 16 (2020)
S. Schopper, Chair of Structural Mechanics, https://www.cee.ed.tum.de/en/bm/research/fields-of-research/parametric-model-order-reduction/
F. Schneider, I. Papaioannou, M. Ehre, and D. Straub, “Polynomial chaos based rational approximation in linear structural dynamics with parameter uncertainties,” Comput Struct, vol. 233, p. 106223, Jun. 2020, doi: 10.1016/j.compstruc.2020.106223.
C. J. Sallaberry, J. C. Helton, and S. C. Hora, ‘Extension of Latin hypercube samples with correlated variables’, Reliab. Eng. Syst. Saf., vol. 93, pp. 1047–1059, 2008.
I. Papaioannou, ‘Stochastic Finite Element Methods’. Technical University of Munich-TUM, 11 2020.
F. Schneider, I. Papaioannou, D. Straub, C. Winter, and G. Müller, “Bayesian parameter updating in linear structural dynamics with frequency transformed data using rational surrogate models,” Mech Syst Signal Process, vol. 166, p. 108407, Mar. 2022, doi: 10.1016/j.ymssp.2021.108407.
A. Kaszynski, “pyansys/pymapdl: v0.60.3”. Zenodo, Nov. 25, 2021. doi: 10.5281/zenodo.5726008.
Ansys® Academic Research Mechanical, Release 18.2, Help System, ANSYS Mechanical APDL Theory Reference, ANSYS, Inc.
Q. Wang, J.S. Hesthaven and D. Ray, "Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem", Journal of Computational Physics, vol. 384, May 2019, doi:10.1016/j.jcp.2019.01.031.
S. L. Brunton and J. N. Kutz, Data-Driven Science and Engineering. Cambridge University Press, 1 2019.
C. F. Jekel, K. E. Swartz, D. A. White, D. A. Tortorelli, and S. E. Watts, ‘Neural Network Layers for Prediction of Positive Definite Elastic Stiffness Tensors’, 3 2022.
C. Bishop, Pattern recognition and machine learning. Springer, 2006.
S. Kollmannsberger, D. D’Angella, M. Jokeit, and L. Herrmann, Deep Learning in Computational Mechanics, vol. 977. Springer International Publishing, 2021.
H. Panzer, J. Mohring, R. Eid, and B. Lohmann, ‘Parametrical Model Order Reduction by Matrix Interpolation’, At-Automatisierungstechnik, vol. 58, pp. 475–484, 8 2010.
J. S. Hesthaven and S. Ubbiali, ‘Non-intrusive reduced order modeling of nonlinear problems using neural networks’, Journal of Computational Physics, vol. 363, pp. 55–78, 6 2018.
Bai, Z., Meerbergen, K., Su, Y. (2005). Arnoldi Methods for Structure-Preserving Dimension Reduction of Second-Order Dynamical Systems. In: Benner, P., Sorensen, D.C., Mehrmann, V. (eds) Dimension Reduction of Large-Scale Systems. Lecture Notes in Computational Science and Engineering, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27909-1_7

Seitenhierarchie

Parametric Moder Order Reduction with Machine Learning

Introduction

Model Order Reduction

Problem Definition

Data Workflow

Methodology

Sampling Strategy and Procedure

Independent Material Parameters

Latin Hypercube Sampling (LHS)

Pythonic Coupling with ANSYS APDL

Sampling and Model Generation Coupling

Artificial Neural Networks and Additional Models

Introduction

Applied Models

Approach 1: Pure Data Based Neural Network

Approach 2: Global Basis Reduction coupled with Neural Networks

Offline Phase

Online Phase

Approach 3: Learning the Cholesky Decomposition of Matrices

SOAR Algorithm

Structure-Preserving Dimension Reduction

Matrix interpolation

Inconsistencies

Results

Approaches 1 and 2 with Neural Networks

Model Order Reduction with SOAR Algorithm

First Approach

Second Approach

Neural Network Approach 3

Conclusions

Future Work

References