Introduction

The goal of this project is to find the governing differential equation of a system based on measurements from it, such that the following conditions are satisfied:

  • the resulting equation should be sparse, i.e. consist of the minimum number of terms that describe the system with a desired accuracy
  • the algorithm should be scalable to multiple dimensions and degrees of freedom
  • noisy measurements of the same system should yield similar results

Our work is based on Discovering governing equations from data:Sparse identification of nonlinear dynamical systems, which explores linear regression techniques such as the Least Absolute Shrinkage and Selection Operator (LASSO) and Sequential Thresholding to find the optimal linear combination of a set of ansatz functions (function library) that best match the time derivatives of the measurements. A brief description of the relevant regression techniques is presented in the next section, followed by a discussion on the main challenges of such an approach. Lastly, results are presented for various systems using our implementation in python.

Linear regression

To use a compatible notation, the differential equation in question is assumed to take the form of a first order system. Note that this assumption does not introduce any constraint since any differential equation can be transformed into a system of first order differential equations. Although the scope of this project is limited to Ordinary Differential Equations (ODEs), the approach trivially generalizes to Partial Differential Equations (PDEs) as detailed in the source paper.

As mentioned in the introduction, the basic idea is to find a linear combination of ansatz functions g_j(x,t) that sufficiently approximate the derivatives \dot{x} of the given measurements x.

\dot{x}_i = f_i(x,t) \approx \sum_j \theta_{ij} g_j(x,t)

The first, and one of the most important task, is to choose a set of ansatz functions that will be used in the approximation. Naturally, a function library that is unable to properly approximate the system will not lead to useful results. Furthermore, including linearly dependent functions degrade the results and can be difficult to interpret, or they can even prohibit some methods from converging on a solution. If one has no intuition of the system in question, using a truncated polynomial function space can prove to be a good starting point. Many physical systems can be well approximated by polynomials, or even linear terms, around their equilibrium points. Even if this is not the case, the taylor expansion of simple systems may be discovered by closely investigating the result of a high order polynomial function library.

An implementation of generating a complete set of n-dimensional monomials at the time series of n derivatives up to order p is provided in python, and can be generated as follows:

polynomialLibrary = SystemID.FunctionLibrary.PolynomialFunctionLibrary( derivatives, p )


In the following subsections, three relevant linear regression methods are briefly introduced and evaluated with respect to their applicability for this project.

Least squares

Like any other regression method, the least squares fit minimizes a loss function L(\theta), which in this case is the 2-norm of the difference between the derivatives of the measured variables \dot{x} and the linear combination of the ansatz functions.

L(\theta) = \sum_i \sum_k \left( \dot{x}_{i}(t_k) - \sum_j \theta_{ij} g_j(x_i(t_k),t_k) \right)^2
\theta = argmin_{\theta} L(\theta) = argmin_{\theta} \sum_i \sum_k \left( \dot{x}_{i}(t_k) - \sum_j \theta_{ij} g_j(x_i(t_k),t_k) \right)^2

The resulting coefficients \theta define the linear combination of ansatz functions that best approximates \dot{x} for the given measurements. Compared to the other considered methods, computational expenses are minimal since a single singular value decomposition has to be performed that provides the analitical solution to the optimization problem. However, this method is relatively sensitive to noise in the measurements and more importantly, provides non-zero coefficients for all ansatz functions. As a result, the differential equation has as many terms as functions in the function library, violating the goal of a sparse solution.

Least absolute shrinkage and selection operator

The LASSO method addresses the shortcomings of the least squares fit by adding an extra term \lambda ||\theta||_1 in the loss function that penalizes all non-zero coefficients.

(1) L(\theta) = \lambda ||\theta||_1 + \sum_i \sum_k \left( \dot{x}_{i}(t_k) - \sum_j \theta_{ij} g_j(x_i(t_k),t_k) \right)^2

Since the penalty term is proportional to the 1-norm of the coefficient vector \theta, the solution is likely to produce few non-zero coefficients. By adjusting the regularization parameter \lambda, the ratio between the two terms of the loss function can be controlled, which in turn influences the sparsity and overall magnitude of the coefficients. Choosing an optimal regularization parameter is not trivial and will be addressed later in this document. Unfortunately, no analytical solution exists for the general case of this optimization problem so an iterative scheme must be used, making the LASSO method computationally more expensive that a least squares fit.

Sequential thresholding

Sequential thesholding is an iterative regressional meta-method that relies on other regression schemes, suchs as a least squares fit or LASSO fit, to provide a solution at each step. Ansatz functions with coefficients smaller than a threshold \lambda are discarded, and the regression is performed on the remaining functions in the library. This loop continues until no functions are discarded in a step.

A well-chosen threshold naturally leads to a sparse solution, however a possible issue that was not encountered in this project, is discarding important ansatz functions in early stages of the iteration. While the LASSO optimization loop always considers the full ansatz space at each step, sequential thresholding always shrinks its parameter space, effectively limiting its trajectories during optimization. On the other hand, this behaviour ensures that the upper limit for the number of iterations is less than the product of the number of degrees of freedom and the number of ansatz functions.

Note that the range of viable thresholds can be limited by normalizing the input data and ansatz functions.

Computation of the derivatives

To obtain a differential equation, a relation between the derivatives of the data and and the data itself has to be found. However, from measurements often only data of one variable and no information about the derivatives is available. Therefore, these have to be computed numerically using finite differencing (compare (Central difference method:1)). Here we assume, that the time between two entries in the vector containing the data is constant. If this is not the case, special treatment would be necessary.

However, data from measurements is always subject to inaccuracies and thus contains noise, which in turn escalates the error in numerically computed derivatives. To avoid this, a filter can be applied to the data before computing the derivatives, for example using total variation regularization (Brunton). The total variation (TV) of a discrete signal u is given by (Chambolle)

J(u) = \sum_{i=1}^n |u_{i+1} - u_i |.

Given a noisy signal g, a denoised function u is obtained by solving the following minimization problem (Chambolle):

(2) \min_{u \in \rm I\!R^n} \frac{|| u - g ||}{2 \lambda} + J(u).

It appears, that the first term ensures, that the denoised function u is close to the original signal g, whereas the second term enforces smoothness in u. Depending on the choice of \lambda, more emphasis can be put on one of the two terms: For smaller values of \lambda, u will fit better to g but will be more irregular. Hence, this weighting factor is problem-dependent and has to be adapted depending on the signal g.

The performance of the regularization is tested using the oscillation of a pendulum, where a point mass is attached to a massless string of length L. The solution for the angle \theta is given by

\theta = \theta_0 \cdot cos(\omega t),

where \theta_0 denotes the initial angle and the natural frequency \omega is computed by \omega = \sqrt{\frac{g}{L}}. To simulate real behaviour, artificial noise is added. The noise is generated with a uniform probability within a certain percentage of \theta_0. The result is shown in the figure below.

First, the general effect of the TV regularization shall be regarded. For this purpose, the filtering is applied to the noisy data shown above using different values for the weight \lambda to examine the influence of this parameter:


It can be seen, that for high values of \lambda, the resulting function is very smooth but does not follow the original data very well, especially not the peaks. The blue line showing the result for the smallest chosen weight approximates the original data very well but thus also follows the noise. However, in between there is a weight, for which an accurate approximation with little noise is obtained, which is the optimal outcome of the filtering. To achieve a result of this quality is immensely important as the derivatives are directly used to compute the differential equation.

In order to asses the error, which is generally introduced by a numerical computation of the derivatives, the first and the second derivative are computed with finite differencing using the analytical data without noise. The relative error compared to the analytical derivative is then computed via

\epsilon = \frac{|\dot{x}_{FD} - \dot{x}_{analytic}|}{\max | \dot{x}_{analytic} |},

where \dot{x}_{FD} denotes the derivative obtained from finite differencing and \dot{x}_{analytic} the analytic derivative. Depending on the number of samples, the maximum error lies between 10^{-1} and 10^{-2} for the first derivative, while for the second it is between 10^{-1} and 10^{-4}. The reason for this might be, that for the first derivative, forward differences were used, whereas the second derivative exploits central differencing.

 

For noisy input data, the relative error introduced by finite differencing gets tremendously worse, as expected. It is also interesting to note, that the behaviour of the error is now inverted: The maximum and average error grow with increasing number of samples.

Now, the total variation regularization as described in Eq. (2) is applied to the data before the derivatives are computed via finite differencing and then again to the derivatives. For this purpose, the function denoise_tv_chambolle from the scikit package for Python is used (scikit.denoise_tv_chambolle). The error for various numbers of samples now looks as follows:

The results are tremendously improved compared to the unfiltered derivatives, but still a magnitude of 10^1 for the first derivative respectively 10^2 for the second derivative larger than the smallest possible error defined by finite differencing on the analytical data.

Optimization

Cross-validation

To find the hyperparameter \lambda in the LASSO model (1), cross-validation (CV) can be used. In CV, the whole dataset is divided into a training set and a test set. The regression model is computed using the training set and the accuracy of this model can then be assessed by comparing the prediction of the model for the test set to the real data that was left out. In order to not lose any data, this process is repeated such that each data is once the test data and the other times used for training. By partitioning the whole data into k parts, k folds are obtained with this approach, which gives this CV method the name k-fold CV. (Scikit Cross-Validation) An example for 5 folds is shown below.

The package used in this project, scikit learn, already provides a function that performs the LASSO algorithm and on the fly finds an optimal hyperparameter \lambda with respect to CV:

 sklearn.linear_model.MultiTaskLassoCV(eps=1e-3, n_alphas=100).fit(X, Y)

The argument eps defines the ratio \frac{\lambda_{min}}{\lambda_{max}}, within which the optimal parameter is searched and n_alphas is the number of values that should be tested for \lambda. However, this method shows two major drawbacks in our application: Firstly, the choice of eps and n_alphas have a strong influence on the final result and also have to be trimmed for a good solution, which is actually the problem that should have been avoided. Secondly, the cross-validation is performed using the derivatives computed from measured data. As already mentioned in the previous section, these derivatives are computed numerically and probably introduce errors. Therefore, it is not appropriate to use them as validation for the choice of the parameter \lambda.

Direct optimization

The alternative to cross-validation is to perform a scalar optimization on the hyperparameter. To limit the range of viable parameters to [0,1], both the input data and function library are normalized. Defining the objective of the optimization is not straightforward, but since the exact coefficients are not known beforehand, it should be based on how the measurements and the solution of the fitted differential equation differ. An analytical solution cannot be computed in general, so a numerical solution \tilde{x} is computed at the measured time points.

A naive objective would be a norm of the difference between the numerical solution and the measurement, but this approach has some obvious disadvantages. For example, consider a solution which is identical to the measurement but is shifted in time. Even though the behaviour of such a solution is nearly identical to the measurement, this objective still interprets it as having a large error.

An alternative approach that proved to perform well for our test cases, is transforming both the numerical solution and the measurements into the frequency domain and computing the norm of the difference between their magnitudes.

f(x,\tilde{x}) = \left| \left| | \int_{-\infty}^{\infty} x e^{-i \omega t} dt | - | \int_{-\infty}^{\infty} \tilde{x} e^{-i \omega t} dt | \right| \right|

An obvious disadvantage of directly optimizing \lambda is the enormous computational overhead. Not only does every iteration require a new fit, but a numerical solution has to be computed for the resulting differential equation as well. While this is manageable for ODEs, it might prove to be unsuitable for PDEs.

Results

Generating data

All data used for the validation of the algorithm was generated computationally, because no real data was available to us. For most of the test cases (linear pendulum, free vibration of an SDOF system and harmonic excitation of an SDOF system) an analytic solution exists, that was implemented to create the data. For the other problems (nonlinear pendulum, arbitrary excitation of an SDOF system) the differential equation was solved numerically. This can be done using the function odeint from the package scipy.integrate (scipy.integrate.odeint). First, the differential equation has to be defined as a function, which looks for the nonlinear pendulum as follows:

def diffEqNonlinearPendulum(y, t, L):

    theta, dtheta = y
    dydt = [dtheta, - (9.81 / L) * np.sin(theta)]

    return dydt

In lines 3 and 4, the differential equation is transformed into a problem of first order. This function is then passed as first argument to odeint. The other arguments are the initial values, a vector of instances in time when the differential equation needs to be evaluated, and arguments in the differential equation:

time = np.linspace( 0.0, 50, num=10000 )

# solve differential equation
sol = odeint(diffEqNonlinearPendulum, theta0, time, args=(pendulumLength,))

In case of an inhomogenity in the differential equation, a function representing the right-hand side for a given instance in time has to be defined additionally. An excitation function of for example f(t) = 2 - \frac{t}{50} would simply be programmed as

def excitationF(t):

	return 2 - t / 50

This function then needs to be passed to the differential equation as additional argument and the value of the inhomogenity at each time step is simply the return value of the excitation function:

def diffEqForcedVibration(z, t, m, c, k, excitationF):

	f = excitationF(t)
	dzdt = (z[1], (1/m) * (-c * z[1] - k * z[0] + f))

	return dzdt

The solution is obtained with

sol = odeint(diffEqForcedVibration, w0, t, args=(m, c, k,  excitationF))

Linear Pendulum

The simplest test system we tried our implementation for is the linear pendulum, which is ideal for demonstrating the main challenges of the identification process due to its simplicity. The governing differential equation and its first order form are as follows:

\ddot{x}(t) + \frac{g}{L}x(t) = 0
\begin{bmatrix} \dot{x}(t) \\ \ddot{x}(t) \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ -\frac{g}{L} & 0 \\ \end{bmatrix} \begin{bmatrix} x(t) \\ \dot{x}(t) \end{bmatrix}

Using a polynomial ansatz space up to order 5 and the analytical derivatives, all methods yield the exact coefficients up to round-off errors. In a real application however, the numerically computed derivatives have to be used, which introduce errors into the solution. Therefore the first task is to obtain more accurate derivatives. Apart from dealing with noisy measurements mentioned earlier, the only possibility to to get more accurate derivatives is to increase the sampling frequency during measurement. This leads to the added benefit that regression techniques generally become more accurate and robust with a larger sample size. The figure below demonstrates how the error of the solution coefficients decreases with the increasing number of samples within a fixed time frame.

The other main factor influencing the quality of the results is the choice of hyperparameter \lambda. Though its exact effect is unique to each regression method, a higher \lambda generally leads to a more sparse solution at the expense of accuracy. However, the solution error often has a minimum when \lambda \neq 0, as is the case with the linear pendulum. The graph below demonstrates how the number of terms in the differential equation (blue line) decreases as a function of the regularization parameter \lambda while the error (orange line) has a minimum, using a LASSO fit.

Finding an optimal hyperparameter is therefore essential to good solution, and is a task that is addressed in the section on optimization.

Nonlinear pendulum

For the nonlinear pendulum, the assumption of small angles, namely \sin(\theta) \approx \theta, is not made. Hence, the differential equation reads

\frac{\text{d}^2 \theta}{\text{d} t^2} + \frac{g}{L} \sin(\theta) = 0

First, only polynomials of \theta up to order 9 are regarded in the function library. For this case, the exact differential equation cannot be found, as \sin(\theta) is not included in the library of possible solutions. Hence, the algorithm will try to approximate the real solution in the best way. For an initial angle of 90° and an evaluation of the angle at 15.000 equally distributed steps in time within t \in [0, 50], the following coefficients are found:

\ddot{\theta} = -9.6518 \cdot \theta + 1.3152 \cdot \theta^3 + 0.07775 \cdot \theta^5 - 2.5592 \cdot 10^{-3} \cdot \theta^7 - 7.4303 \cdot 10^{-5} \theta^9.

As \sin(\theta) is an odd function, it makes sense, that only polynomials with odd exponents appear. Furthermore, it could be expected, that the algorithm delivers the Taylor series expansion of the sine-function, which is

sin(x) = x - \frac{x}{3!} + \frac{x}{5!} - \frac{x}{7!} + \dots.

Inserting the factor \frac{g}{L} into this approximation, the ideal result for the nonlinear pendulum approximated with polynomials should be

-\frac{g}{L} \sin(\theta) \approx -9.81 \cdot \theta + 1.635 \cdot \theta^3 - 0.08175 \cdot \theta^5 + 1.946 \cdot 10^{-3} \cdot \theta^7 - 2.703 \cdot 10^{-5} \cdot \theta^9,

thus apart from the high-order terms, the result is relatively accurate and indicates well the need for \sin(\theta) in the function library.

For the following results, the function library now contains \theta and \sin(\theta), which are the solutions for the linear and the nonlinear pendulum. The results are obtained from solving the differential equation numerically using different initial values. The initial angular velocity remains 0 for all tests, whereas the initial angle varies from to 90°. On the left, the numerical solution is listed for selected initial angles and on the right the absolute values of the coefficents as well as the sum of both coefficients and g=9.81 are plotted versus the initial angle:

initial angle θ0

computed differential equation

15°

\ddot{\theta} = -9.7159 \cdot \theta -0.0002 \cdot \sin(\theta)

30°

 \ddot{\theta} = -8.2753 \cdot \theta -1.2353 \cdot \sin(\theta)

45°

\ddot{\theta} = -4.4563 \cdot \theta -4.9844 \cdot \sin(\theta)

60°

\ddot{\theta} = -0.7994 \cdot \theta -8.8815 \cdot \sin(\theta)

75°

\ddot{\theta} = -0.1395 \cdot \theta -9.6263 \cdot \sin(\theta)

90°

\ddot{\theta} = -0.0456 \cdot \theta -9.7367 \cdot \sin(\theta)

It can be seen, that for the extremes the solution is very accurate: for small initial angles \theta, the small angle approximation is valid and even though the nonlinear differential equation was solved, \ddot{\theta} can be approximated well by - \frac{g}{L} \theta. Also, for large initial angles, the sine-term dominates the solution. Furthermore, one can deduce the validity of the linearized model from this plot: After an initial angle of approximately 25°, the influence of \sin(\theta) increases drastically, which indicates that for larger initial angles, the nonlinear nature of the sine-function cannot be neglected anymore.

Single degree of freedom (SDOF) system

Free Vibration

The free vibration of an SDOF system is described by the equation of motion with a homogeneous right-hand side:

m \ddot{w} + c \dot{w} + kw = 0.

Depending on the damping ratio D=\frac{c}{2\sqrt{km}}, the system can be either undercritically (D<1), overcritically (D>1) or critically (D=1) damped, which leads to different solutions for the displacement:

To asses the performance of the algorithm, it was tested for all three cases by setting m=1, k=16 and varying c from 2.0 \text{ } (D < 1.0), 8.0 \text{ } (D = 1.0) to 14.0 \text{ } (D > 1.0). As data, the analytic solution for the displacement was computed and the derivatives were obtained by finite differencing. A filtering was not necessary in this case, as no artifical noise was added to the data. In the function library the displacement w and the velocity \dot{w} as well as superfluous quadratic terms w^2, w \dot{w} and \dot{w}^2 were included. Due to the fact, that the differential equation is linear, the algorithm did not have any problems obtaining the correct solution for any of the three cases:

\begin{align} \text{Case 1: } m=1, k=16, c&=2 \rightarrow D < 1 \\[5mm] \ddot{w} &= -16.0052 \cdot w - 2.0008 \cdot \dot{w} \\[5mm] \text{Case 2: } m=1, k=16, c&=8 \rightarrow D = 1 \\[5mm] \ddot{w} &= -15.9940 \cdot w - 8.0152 \cdot \dot{w} \\[5mm] \text{Case 3: } m=1, k=16, c&=14 \rightarrow D > 1 \\[5mm] \ddot{w} &= -15.9224 \cdot w - 14.0242 \cdot \dot{w} \end{align}

Arbitrary Excitation

The equation of motion for a forced vibration is given by

(3) m \ddot{w} + c \dot{w} + kw = f(t).

For all previously regarded differential equations, the solution was represented by functions of the data or its derivatives. Due to the inhomogenity, this is now not possible anymore. Therefore, the function library needs to be extended with terms built from the time vector t. For this, polynomials are also a good first approach and can then be adapted if a different behaviour is suspected.

To validate the performance of the algorithm the following numerical example was chosen:

\ddot{w} + 0.15 \dot{w} + 4 w = 2 - 0.02t

Building the function library from the exact terms w, \dot{w}, t and a vector containing ones as well as a superfluous term t^2, the result found by the algorithm is fairly accurate:

\ddot{w} = -3.9991 \cdot w - 0.1553 \cdot \dot{w} - 0.0180 \cdot t + 1.9791

Note, that for solutions with a constant term it is crucial, that the intercept has to be computed. Otherwise, the constant term will be omitted.

Also, the function of the excitation can only be found as long as it can be represented or at least approximated well by the functions from the function library. If f(t) would for example be a sawtooth function, the algorithm could not find a combination of polynomials to represent this behaviour. For the special case of arbitrary periodic functions like the sawtooth function, a possible remedy could be to use terms of the Fourier series. However, this was not regarded in the scope of this project. Nevertheless, for transient loads no approach but to "guess" the exact function for the excitation could give a solution.

Harmonic Excitation

For a harmonic excitation, the load can be written as

f(t) = f_0 \cos(\Omega t + \varphi_f) = f_0 \left[ \cos(\Omega t) \cos(\varphi_f) - \sin(\Omega t) \sin(\varphi_f) \right],

with the amplitude f_0 , excitation frequency \Omega and phase shift \varphi_f . The equation of motion thus reads

(4) m \ddot{w} + c \dot{w} + kw = f_0 \cos(\Omega t + \varphi_f).

In the following, we will only consider the particular solution, so assume that the homogeneous solution is damped out completely. In this case, the displacement and the velocity are

(5) w(t) = \frac{f_0}{k} V(\eta) \cdot \cos(\Omega t + \varphi_w) \\ \dot{w}(t) = -\frac{f_0}{k} V(\eta) \cdot \Omega \cdot \sin(\Omega t + \varphi_w),

where the amplification function and the phase of the response are computed as follows:

V(\eta) = \frac{1}{\sqrt{(1-\eta^2)^2 + (2D\eta)^2}} \qquad \text{with } \eta = \frac{\Omega}{\omega} \\ \varphi_w = \varphi_f + \Delta \varphi \qquad \text{with } \Delta \varphi = \text{atan2}(-c\Omega , k - m\Omega^2)

Rewriting (5) leads to

w(t) = \frac{f_0}{k} V(\eta) \cdot \left[ \cos(\Omega t) \cos(\varphi_w) - \sin(\Omega t) \sin(\varphi_w) \right] \\ \dot{w}(t) = -\frac{f_0}{k} V(\eta) \cdot \Omega \cdot \left[ \sin(\Omega t) \cos(\varphi_w) + \cos(\Omega t) \sin(\varphi_w) \right].

It can be seen that the function f(t) can be expressed as linear combination of the solutions for w(t) and \dot{w}(t). Hence, the algorithm to identify the underlying differential equation will not be able to detect all parameters, because f(t) does not differ from w(t) and \dot{w}(t). To emphasize this, let us have a closer look at the outcome. To extract the \sin(\Omega t) and the \cos(\Omega t) part from w(t) and \dot{w}(t), let us regard the following expressions:

\begin{align} w(t) \cdot \sin(\varphi_w) + \frac{\dot{w}(t)}{\Omega} \cdot \cos(\varphi_w) &= - \frac{f_0}{k} V(\eta) \cdot \sin(\Omega t) \\ w(t) \cdot \cos(\varphi_w) - \frac{\dot{w}(t)}{\Omega} \cdot \sin(\varphi_w) &= \frac{f_0}{k} V(\eta) \cdot \cos(\Omega t). \end{align}

Thus, f(t) can be expressed in terms of w(t) and \dot{w}(t) as follows:

f(t) = \frac{k}{V(\eta)} \cos(\varphi_f) \cdot \left( w(t) \cdot \cos(\varphi_w) - \frac{\dot{w}(t)}{\Omega} \sin(\varphi_w) \right) + \frac{k}{V(\eta)} \sin(\varphi_f) \cdot \left( w(t) \cdot \sin(\varphi_w) + \frac{\dot{w}(t)}{\Omega} \cos(\varphi_w) \right).

Inserting this into (3) and solving for \ddot{w}(t) results in

(6) \ddot{w}(t) = \frac{1}{m} \cdot \left[ \dot{w}(t) \left( - c + \frac{k}{V(\eta)} \frac{1}{\Omega} \sin(\varphi_f - \varphi_w) \right) + w(t) \left( -k + \frac{k}{V(\eta)} \cos(\varphi_f - \varphi_w) \right) \right].

This means that the initial differential equation (4) can be condensed into 

\ddot{w} = \frac{1}{m} \left[ -\hat{c} \dot{w}(t) - \hat{k} w(t) \right],

with

\begin{align} -\hat{c} &= - c + \frac{k}{V(\eta)} \frac{1}{\Omega} \sin(\varphi_f - \varphi_w)\\ -\hat{k} &= -k + \frac{k}{V(\eta)} \cos(\varphi_f - \varphi_w), \end{align}

which does not contain any information about the excitation explicitly anymore, but implicitly included in \hat{c} and \hat{k}.

Let us regard the following numerical example to underline this:

\ddot{w} + 0.4 \dot{w} + 20 w = 7 \cos(\Omega t + 0.2).

The excitation frequency is varied from 1 to 10: \Omega \in \{1, 2, ..., 10 \}. The function library consists of w, \dot{w} and \cos(\Omega t + \varphi_f). Inserting these values into (6) gives the analytic relation, which is compared to the numerical result obtained from the algorithm in the following table:

Excitation Frequency

Analytic Solution

Numerical Solution

1

-w

-0.9991w

2

-4w

-3.9993w

3

-9w

-8.9995w

4

-16w

-15.9998w

5

-25w

-24.9997w

6

-36w

-35.9992w

7

-49w

-48.9985w

8

-64w

-63.9977w

9

-81w

-80.9967w

10

-100w

-99.9956w

So even though all correct terms from the equation of motion (4) are included in the function library, the complete underlying differential equation cannot be unraveled, as the load is a linear combination of w(t) and \dot{w}(t).

However, these results are obtained under the assumption that the data only contains the particular solution of the displacement, so the homogeneous part is completely damped out. Therefore, initial conditions should now be regarded. In this case, the total displacement is a superposition of the homogeneous and the particular part. The homogeneous part also consists of sine and cosine functions, but multiplied with an exponential function that dampens the displacement and the derivatives to 0 over time. With this additional characteristic, the displacement and the velocity can be distinguished clearly from the excitation by the algorithm in case the homogeneous part has enough influence on the total solution. Thus, the performance of the algorithm depends on the contribution of those two parts to the complete solution. This should be explained in detail for the same numerical example as before, but now with an initial dispacement:

\ddot{w} + 0.4 \dot{w} + 20 w = 7 \cos(2 t + 0.2).

The function library again consists of all exact terms w, \dot{w} and \cos(\Omega t + \varphi_f).

Initial Displacement w0

Analytic Solution

Numerical Solution

-0.01

- 0.4 \dot{w} - 20 w + 7 \cos(2 t + 0.2).

-3.9921w

-0.1

- 0.4 \dot{w} - 20 w + 7 \cos(2 t + 0.2).

-0.3703 \dot{w} -18.8415w + 6.4930 \cos(2 t + 0.2)

-1

- 0.4 \dot{w} - 20 w + 7 \cos(2 t + 0.2).

-0.3703 \dot{w} -18.8415w + 6.4930 \cos(2 t + 0.2)

-10

- 0.4 \dot{w} - 20 w + 7 \cos(2 t + 0.2).

-0.3996 \dot{w} -20.0w + 6.9964 \cos(2 t + 0.2)

It can be seen, that for a sufficiently large initial displacement, the exact result is found, containing very good approximations of all parameters in the original differential equation. With decreasing w_0, the computed solution tends towards the result obtained for the particular solution only. As in the comparison between the linear and the nonlinear pendulum, the results deteriorate in the range between the two extremes, and hence have to be observed carefully.

Frequency Domain

Up to now, only data acquired over time x(t) was regarded. Therefore, let us regard data given in the frequency domain x(\Omega). In the time domain, we are looking for the underlying differential equation. By applying a Fourier Transformation to change to the frequency domain, the wanted equation of motion is no longer a differential equation but an algebraic equation (compare (Equation Of Motion:2)). Hence, the outcome of the algorithm would be the transfer function h(\Omega) such that

w(\Omega) = h(\Omega) f(\Omega).

In case of a harmonic excitation of an SDOF system, the absolute of the transfer function reads

\begin{align} | h(\Omega) | &= \frac{1}{k \sqrt{(1-\eta^2)^2 + (2D\eta)^2}} \\ \text{with } \qquad \eta &= \frac{\Omega}{\omega} \\ \omega &= \sqrt{\frac{k}{m}} \\ D &= \frac{c}{2 \sqrt{km}} \\ \end{align}

In contrary to the equation of motion of a forced vibration (Equation Of Motion:2), the transfer function is highly nonlinear in terms of \Omega. Hence, it cannot be expressed as a superposition of individual, relatively simple terms as the differential equation. Therefore, the choice of the functions for the function library that had to be provided to compute the transfer function using the algorithm would not be intuitive at all, which makes a computation for a pracitical application impossible. For this reason, data given in the frequency domain was not further regarded in the scope of this project.

Conclusion

The LASSO algorithm proved to be a very good method to find linear differential equations using data from measurements and a function library constructed from that data. By finite differencing and applying a filter like total variation regularization the derivatives can be approximated sufficiently well for accurate results. By varying the hyperparameter of the LASSO algorithm respectively the threshold for the sequential thresholding, the complexity of the solution, which means the number of non-zero terms from the function library, can be adjusted. To find an optimal hyperparameter for the LASSO algorithm several methods were investigated like using cross-validation on the derivative or integrating the approximated solution and compare it in the Fourier transformed domain to the original data. The latter seems to be more appropriate for the application and also relies on less user input that influences the final result.

For nonlinear differential equations, the method only works to a certain degree. The key aspect is, that accurate solutions for nonlinear differential equations can only be obtained if the real differential equation can be approximated sufficiently well with the functions in the function library. This is for example the case for sine or cosine functions which can be written as sum of polynomials. Data obtained from the frequency domain however connects to the load via complicated transfer functions that cannot be represented as superposition of simple terms of the excitation frequency \Omega. Furthermore, for differential equations in which some of the terms consist of the same functions, the complete differential equation cannot be unraveled. The reason is that in this case one function can be expressed as linear combination of others. The algorithm would then find this simplification, where the linear dependent terms are replaced by their basis.

For future work, the algorithm could also be extended for partial differential equations. Furthermore, the choice of functions in the function library heavily influences the final result and has to be done by the user up to now. To circumvent this, a hierarchic approach is suggested by (Brunton), in which additional terms are added to the function library until convergence is achieved.

References

Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. "Discovering governing equations from data by sparse identification of nonlinear dynamical systems." Proceedings of the national academy of sciences 113.15 (2016): 3932-3937.

Chambolle, Antonin. "An algorithm for total variation minimization and applications." Journal of Mathematical imaging and vision 20.1-2 (2004): 89-97.

https://scikit-image.org/docs/dev/api/skimage.restoration.html#skimage.restoration.denoise_tv_chambolle

https://scikit-learn.org/stable/modules/cross_validation.html