Syllabus - Data-Driven Modeling in Science and Engineering / Spring 2025

Course Info

Lectures: Tue, Thu - 11:00 - 12:15 PM @ Bechtel 202
Instructor: Joseph Bakarji
Office: Bechtel 418
Office hours: Thu 1:30 PM - 3:30 PM

Description

How do we go from a high dimensional, noisy, nonlinear, complex, and multiscale universe to simple and predictive mathematical models? This course introduces modern machine learning techniques using a wide variety of examples in physical, social and biological sciences. Modern data-driven approaches that take advantage of recent advances in machine learning are introduced, including: sparse identification of differential equations, dynamics mode decomposition, and physics informed neural networks.

Prerequisites: Programming Basics (EECE 230/231), Calculus and differential equations (MATH 202), Probability and Statistics (STAT 230 or MATH 218/219), Linear Algebra (MATH218), or equivalent experience.

Topics

Overview

Machine learning is transforming science and engineering.
The three pillars of artificial intelligence: modeling, learning and inference.
Scientific Python: Scipy, Matplotlib, Numpy, Pandas, etc.
What is scientific modeling: laws, differential equations, linear systems, networks etc.
Noise, probability, uncertainty quantificaiton, nonlinearity and chaos.
The problem of scales and high-dimensionality.

Introduction to Machine Learning

Introduction to linear regression.
Machine leraning theory: the variance-bias trade-off, hyperparameters, feature engineering, optimization, regularization.
Types of ML models: parametric, non-parametric, supervised, unsupervised, etc.
Application to empirical laws: Galileo’s pendulum.

Time-Series Analysis

Types of time-series: one dimensional, multi-dimensional, discrete, continuous, etc.
Statistical methods and signal processing techniques.
Spectral methods: Fourier transform
Auto-regressive models.

Modeling with differential equations

Linear and nonlinear systems of ordinary differential equations (ODEs) with examples
Analytical and numerical solvers.
System identification: finding coefficients of differential equations from data.
Sparse models for discovering ODEs and PDEs from data.
Genetic algorithms and symbolic regression.
Gaussian processes.

Unsupervised learning

Dimensionality reduction: singular value decomposition.
Clustering algorithms and their applications in science and engineering.
Proper orthogonal decomposition (POD)
Dynamic Mode decomposition (DMD)
Discovering fundamental physical variables from data.

Deep Learning

Introduction to fully-connected and other types of deep networks
PyTorch tutorial and examples
Scientific applications
Surrogate modeling for continuous systems
RNNs for time series analysis
Physics Informed Neural Networks (PINNS) and DeepONets
Latent variable discovery with autoencoders
Graph neural networks.

Project Description

You will be provided with a list of suggested datasets to which you have to apply concepts learned in the course. A progress report is due on the 9th week. Graduate students are encouraged to use data collected from or related to their own research projects (i.e. BYOD).

Assessment

Short Quizzes (10%)
Assignments (25%)
Late Midterm Exam (25%)
Group project (40%)
- Proposal (5%)
- Progress Report (5%)
- Final Report (30%)

Course Learning Outcomes

This course will provide a brief overview of machine learning methods, with a focus on applications in engineering and science. The purpose of this course is such that students are able to

Collect, visualize, and clean data.
Master the basics of machine learning theory and implement them in Python.
Apply machine learning concepts to problems in science and engineering.
Propose a data-driven hypothesis given a scientific modeling question.
Identify when a data-driven approach is needed and which learning algorithms are appropriate for a given dataset.

Course References

Murphy, K. P. (2022). Probabilistic machine learning: an introduction. MIT press. Free online version
Brunton, S. L., & Kutz, J. N. (2022). Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press. Free online version
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. “O’Reilly Media, Inc.”
Kutz, J. N. (2013). Data-driven modeling & scientific computation: methods for complex systems & big data. Oxford University Press.