Problem Set #4 - Data-Driven Reduced Order Modeling
Problem 1: Reduced Order Modeling (50 points)
In chapter 12 of Data-Driven Science and Engineering, the authors discuss the concept of Reduced Order Modeling (ROM). The idea is to approximate the solution of a high-dimensional system of equations by projecting the solution onto a low-dimensional subspace. This is particularly useful when the high-dimensional system is computationally expensive to solve.
In systems where the variable depends on space and time \((x, t)\), the basic idea is to decompose the solution into a sum of spatial and temporal modes:
\[u(x, t) \approx \sum_{k=1}^{n} a_k(t) \psi_k(x)\]where \(\psi_k(x)\) are the spatial modes and \(a_k(t)\) are the temporal modes. This eigenfunction expansion is the basic assumption behind the Proper Orthogonal Decomposition (POD) when applied to partial differential equations. One possibility for choosing the spatial modes is to use the Fourier basis, i.e. \(\psi_k(x) = \exp(i 2\pi k x/L)\). This is reminiscent of the analytical solution of the heat equation, where the solution can be written as a sum of sines and cosines.
When solution data for \(u(x, t)\) are given, the spatial modes can be obtained by performing a Singular Value Decomposition (SVD) of the data matrix (where each column is a snapshot of the solution at a given time): \(X = U \Sigma V^T\). The spatial modes are then given by the left singular vectors \(U\), and the temporal modes are given by the right singular vectors \(\Sigma V^T\) (see section 12.6).
In this exercise, we will explore a few approaches described in the section 12.6, and apply them to the video of an oscillating spring.
The question we want to answer is: given the first \(m\) frames of the video of a spring oscillation, can we predict the next \(n\) frames?
a) Read the “spring-oscillation-video” with the following OpenCV python code. The code makes the frames black and white, and decreases the resolution. Crop the frames and decrease the resolution as needed to decrease computational cost.
b) Stack the frames into a data matrix \(X\) where each column is a snapshot of the solution at a given time. Remove the mean frame from the data matrix and perform a Singular Value Decomposition (SVD) of the data matrix \(X = U \Sigma V^T\). The spatial modes are then given by the left singular vectors \(U\), and the temporal modes are given by the right singular vectors \(\Sigma V^T\). Plot the first 5 spatial modes, and the first 5 temporal modes. Note that
\[\Sigma V^T = \begin{bmatrix} \vert & \vert & & \vert \\ \mathbf a_1 & \mathbf a_2 & \ldots & \mathbf a_m \\ \vert & \vert & & \vert \\ \end{bmatrix}\]where \(\mathbf a_i\) is the \(i\)-th snapshot of the original data over which the simplified model will be constructed.
c) How many modes do you need to capture 95\% of the energy (variance) of the system? Plot the cumulative energy (eigenvalues) of the system as a function of the number of modes.
d) Use the number of modes (\(p\)) discovered in the previous question to fit a neural network time-stepper that maps \(\mathbf a^{(p)}_k\) to \(\mathbf a^{(p)}_{k+1}\), where \(\mathbf a^{(p)}\) indicates the first \(p\) time-modes. For this question, use a fully connected neural network \(f_\mathbf{w}\) to fit the model:
\[\mathbf a^{(p)}_{k+1} = f_\mathbf{w}(\mathbf a^{(p)}_k)\]Use the first 70% of the frames as training data, the next 20% as validation data, and the last 10% as test data. Be careful to maintain the temporal structure of the data if you shuffle it.
e) Having obtained the weights \(\mathbf{w}\) of the neural network, evaluate your model on the validation data. Start with the last time step of your training data, and use the $p$ dominant singular vectors that you obtained from the training set \(U_p\). Use the neural network to predict the frames for all the time steps in the validation data, by iterating through \(f_\mathbf{w}()\). Once you obtain the predictions \([\mathbf a_{r+1}, \mathbf a_{r+2}, \ldots, \mathbf a_{r+n}]\), use the spatial modes to reconstruct the predicted frames.
- Plot the absolute difference between predicted frames and the actual frames for the last 5 frames of your validation set.
- Does your loss improve if you use more modes?
- Adjust your hyperparameters (network architecture, number of epochs etc.) to improve your results, and finally evaluate your model on the test data.
- Plot the mean square error between the predicted frames and the actual frames as a function of time for your test set, where the first frame of the test set is given.
- Increase \(p\) and evaluate its effect on the performance of the model.
f) Repeat the same exercise using a different time-series model for the latent variable (e.g. LSTM, Autoregressive model, SINDy, etc.). Compare the results with the previous question and comment on your results.
Problem 2: Dynamical systems with Neural Networks (20 points)
The Lorenz system is a system of ordinary differential equations (ODEs) that was developed by Edward Lorenz in the 1960s to describe the behavior of a simple climate model. The system is often used as an example of chaotic systems, or what is known in popular media as the ``butterfly effect’’. It is given by the following set of ODEs:
\[\begin{align*} \frac{dx}{dt} &= \sigma(y-x) \\ \frac{dy}{dt} &= x(\rho-z) - y \\ \frac{dz}{dt} &= xy - \beta z \end{align*}\]where \(x\), \(y\), and \(z\) are the state variables, and \(\sigma\), \(\rho\), and \(\beta\) are parameters. For \(\rho = 28\), \(\sigma = 10\), \(\beta = 8/3\), the state variable \(\mathbf x = [x, y, z]\) is known to be chaotic.
NOTE: If you’ve had enough of the Lorenz system, feel free to use any other system of ODEs or even PDEs that you find interesting.
a) Solve the differential equation numerically using the initial conditions \(\mathbf x = [1, 1, 1]\), a simulation time of \(t_{end}=100\) seconds and \(dt = 0.01\).
b) Given the simulated time series above, and assuming that we don’t know the underlying differential equation, we would like to fit a model that predicts the state \(\mathbf x \equiv \mathbf x(t_i)\) at a given time \(t_i\) based on the previous time-steps:
\[\mathbf x_i = f_\mathbf{w}(\mathbf x_{i-1}, \mathbf x_{i-2}, \ldots, \mathbf x_{i-n})\]where \(f_\mathbf{w}\) is a neural network. Define the input (feature vector) and output for \(n>1\).
c) Assume \(n=1\). Use a fully connected neural network architecture to train your input-output model. Take the first 80\% of the time series as a training data, and the last 20\% as a test data. Plot your results. How does your solution compare with the previous question?
d) Repeat the same exercise for \(n>1\). How does your solution compare with the previous question? (Hint: in the first layer, use Flatten()
to transform your \(n \times 3\) input to a \(3n\) dimensional input. You have to integrate the solution in a for loop: given the initial condition \([\mathbf x_1, \mathbf x_2, \mathbf x_3, \mathbf x_4, \mathbf x_5]\), predict \(\mathbf x_6\), then given \([\mathbf x_2, \mathbf x_3, \mathbf x_4, \mathbf x_5, \mathbf x_6]\), predict \(\mathbf x_7\), etc.).
e) Download Salesforce’s Merlion foundation model using pip install salesforce-merlion
and use it to perform the same task. The model offers methods for fine-tuning and prediction. Use the training data to fine tune the model and compare the predictions with the previous question. Feel free to use any other model you find interesting.