A little about me

Welcome to the data-driven modeling in science and engineering course! My name is Joseph Bakarji and I’m an assistant professor at AUB, jointly appointed at the department of mechanical engineering and the AI, Data Science and Computing Hub.

In this first lecture I will tell you a little about the course, logistics. I graduated from the department of mechanical engineering at AUB in 2013, had a decade long journey along the west coast in the U.S., and came back to AUB 10 years later.

I’ve explored research in a wide range of topics including fluid dynamics, mechatronics, material science, artificial intelligence, social dynamics among others. I’d say that my main research interest is to mathematically model complex systems that are not well understood. One way to put it is that I’m interested in:

Understanding how the world works, using both brains and machines, and
Understanding how intelligence works, both in brains and machines.

This course reflects those interests. The intersection between machine learning and physical modeling is not very new but it has been gaining a lot of attention recently thanks to increased computational power and data availability. The field has many names, including: physics-informed machine learning, data-driven modeling, and scientific machine learning. Or the most generic buzz-word for it is ‘AI for science’. The field is also closely related to scientific computing, computational science, data assimilation, and system identification. Regardless of the name, the big question we’re trying to answer is: how can we use machine learning to understand how the world works?

So while most of the techniques I will cover in this course are not new, the recent hype around deep learning has made it a hot topic. Research in science and engineering is increasingly relying on machine learning to solve problems.

What is the course about?

If you’re taking this course, you’ve probably heard of artificial intelligence and machine learning. And you’ve heard that machines are taking over our lives, and that there will soon be an AI dictator.

Well, people like to panic and you should be careful not to panic with them, but there is some truth to the fact that AI is radically transforming how we do our work, and there are dangers involved. If you’ve used ChatGPT, you know what I’m talking about: chatbots that can answer your questions and write code and essays for you. All this is thanks to machine learning: a set of algorithms that can learn from data.

The course will introduce you to the basic concepts of machine learning, and by the end of the course you will be able to apply these concepts, and with a little more research, come up with your own apocalyptic AI dictators.

But the course is not only about machine learning (that’s why it has this long name). It’s about how we’re increasingly understanding the world using algorithms that learn models from data.

And unlike a normal machine learning course, it will mainly focus on scientific modeling where we already have a lot of modeling techniques, and our standards for accuracy are pretty high.

Also, the course will expose you to the methods of building mathematical models in various fields of science and engineering; that many of you are probably already familiar with. So you will see many of the things you already know (linear algebra, differential equations, numerical methods, algorithms etc.) in a new more applied context.

The goal of the course is to put you in the mindset of a modern scientist/engineer who knows how to take advantage of modern modeling techniques (particularly machine learning) to solve a wide variety or problems. This is why, the course has much more weight on the project (40%) and the assignments (25%)

Reasons not to take this course

This course might not be for you if:

You’re uncomfortable with linear algebra, differential equations, numerical methods, probability and statistics, and programming. If any of these subjects irritate you, you might want to reconsider taking this course.
You’re uncomfortable with Python, or you’re not willing to put in the time to learn it.
You’re uncomfortable with the basics of machine learning, or you’re not willing to put in the time to learn it in a short period of time.
You’re not willing to do lots of reading, writing, and coding. If you’re expecting an easy A, this is not the course for you.
You don’t like math. You will see a lot of new and old theoretical concepts in a new context.
You don’t like coding. Machine Learning and Scientific Computing are all about implementing solutions in code.
You’re unfomfortable with exploration and research beyond the lecture material. You should be ready to do a lot of research on your own. I will not spoon-feed you all the material in class. My job is to give you the tools you need to learn things on your own.

General Learning Guidelines

The main purpose of this course is to get you into the mindset of a modern scientist/engineer who has to either use or create datasets, either use or create algorithms to deal with those datasets. There are very reliable ways to get better at that:

Code, code, code: get comfortable with programming. Explore libraries or languages that can help you solve your problem. Machine learning is a quickly evolving field, and no matter how much theory you know, you will need to keep up with new tools. For that I’ll give you coding assignments, and I’ll try to make them as fun as possible. They might be challenging, but I expect you to know how to use online resources to solve them when needed. This is part of being a computational scientist in this age.(see footnote).
Read and write: the best way to continue learning after you complete this course is to keep up with the literature through reading, and to write about what you learn and build. Write down your project ideas and don’t settle on the first idea that comes to mind. This is how you develop your creativity and your ability to communicate your ideas: these skills are always under-emphasized in STEM education, and you’ll find that much of your job will turn into reading and writing. This is the purpose of the progress and final project reports.
Communicate and share: the best way to learn is to teach. I encourage you to discuss the course material with your colleagues, and to ask questions. There will be a short final presentation of your project. This is a good opportunity to practice your presentation skills, and to learn from your colleagues.

Students will be provided with a list of suggested datasets to which they have to apply concepts learned in the course. A progress report on the 9th week with completed milestones of the course is required. Graduate students are encouraged to apply machine learning tools to data collected from their own research projects.

Online Resources Policy Large Language Models (such as ChatGPT) are not banned, but I recommend using them with extreme caution. I believe that you can only learn by getting exposed to as many problems as possible and deeply thinking about them. When you solve problems, your brain tries multiple routes, failing and learning through trial and error until it becomes good at connecting and building complex ideas. This is how your mind becomes both sharper and more creative.

If you always use a solution manual or an LLM (as a smart solution manual), you basically learn to become obsolete. So every time you use an LLM, I want you to notice who is serving who. Are you learning to become an assistant to the AI, or is the AI helping you become more intelligent? If it doesn’t feel like you’re putting in the effort, and there’s no sweat involved, then it’s probably the former.

So, my recommendation would be: read the question, try to solve it on your own as much as you can, write down your solution, ideas and questions. If you can’t solve it, discuss it with friends. LLMs, search engines and good-old books are actually great tools for exploration; as both can give you great ideas and tips that will not be covered in class (we’re only covering 1% of what’s out there). But nothing can replace the effort of solving problems on your own.

Assessment

Quizzes (10%)
Assignments (25%)
Group project
- Proposal (5%)
- Progress Report (5%)
- Final Presentation (10%)
- Final Report (20%)
Late Midterm Exam (25%)

Course plan and logistics

Course website: I will use the website www.ml4science.com to post the course materials, assignments, and other resources. The purpose is to make it a reference for you that you can come back to even after the course is over.
Slack: we will use Slack for communication and discussions. Some assignments will be posted there.
Course materials: I will post the course materials on the reference page.
Office hours: Thursday 1:30pm - 3:30pm.

Project

The project will be in groups of 2 to 3, where you will be asked to apply the concepts learned in the course to a dataset of your choice. If you want to propose an idea, you should try to work on something where you’re trying to answer a question for understanding a complex physical system; typically one that changes in space and time. Unlike a typical machine learning course where you’re only interested in increasing the accuracy of your model (often by trial and error), here you’re trying to understand how the system works, what the results mean, why and when the model fails, and how it can be used to make predictions or build engineering solutions. So you should be able to explain your model, and what it means in the context of the system you’re studying.

The project will be in 4 stages:

Proposal: you will propose an idea for a project, and you will be asked to justify why it’s a good idea.
Progress Report: you will report on your progress.
Final Presentation: you will present your project to the class.
Final Report: you will write a report (6 pages long) on your project.