Monitoring for machine learning systems

Keeping an Eye on AI

07. Sep 2022

Our machine learning model is trained and finally running in production. But that was the easy part. Now, the real challenge is reliably running our machine learning system in production. For this, monitoring systems are essential. But while monitoring machine learning models, we must consider some challenges that go beyond classic DevOps metrics.

We've all been there: we've spent weeks or even months working on our ML model. We collected and processed data, tested different model architectures, and spent a lot of time fine-tuning our model’s hyperparameters. Our model is ready! Maybe we need a few more tweaks to improve performance more, but it's ready for the real world. Finally, we put our model into production, and sit back and relax. Three weeks later, we get an angry call from our customer because our model makes predictions that don’t have anything to do with reality. A look at the log reveals no errors. In fact, everything still looks good.

However, since we haven’t established continuous monitoring for our model, we don’t know if and when our model’s predictions change. We have to hope that they’ll always be just as good. But if our infrastructure is made out of only tape and hope, then we’ll find lots of errors only in production.

What is MLOps, anyway?

We want to create infrastructures and processes that combine the development of machine learning systems with the system’s operation. That’s the goal that MLOps is chasing. This is closely interwoven with DevOps’ goals and definition. But the goal here isn’t just developing and operating software systems, but developing and operating machine learning systems.

As a data scientist, when I’m confronted with these questions, I have to ask myself why I should worry about them at all. Technically, my goal is just to train the best possible ML model. While this is a worthy goal, I can’t lose sight of the system’s overall context and the company I’m operating out of. About 90% of all ML models are never deployed in a production environment [1], [2]. This means they never reach a user or customer. Provocatively speaking, 90% of machine learning projects are useless for our business. Of course, in the end, a model is only useful if it generates added value for my users or processes.

Besides, when developing a machine learning system, I always keep the following quote from Andrew Ng in mind: “You deployed your model in production? Congratulations. You’re halfway done with your project.” [3]

Productive challenges

Our machine learning project doesn’t end when we bring our first model into a production environment. After all, even a model that delivered excellent values in training and testing faces many challenges once it arrives in production. Perhaps the most well-known challenge is changes in the data or data distribution that our model receives as input. A whole range of events can trigger this, but they’re often grouped together under the term “concept drift”. The data set used to train a model represents an excerpt of our model’s overall reality. This is one of the reasons why collecting as much data as we can is so critical for a well-functioning, robust model. The more data that’s available to the model, the more complete the slice of reality it can represent. But the training data is always just a snapshot of reality, while the world around it is constantly changing. If our model isn’t trained with new data, then it has no way to update its outdated basic assumptions about reality. This leads to the model’s performance declining.

How does concept drift happen? Data can gradually change. For example, sensors get less accurate over a long period of time due to wear and tear and show increasing deviations from the actual measured value. Slow changes in customer preferences is another example. One striking example is a model that makes recommendations for suitable products in a fashion setting. If it isn’t updated and keeps recommending products from the previous season to customers, then customer satisfaction in our shop system will significantly decrease. Recurring events such as seasons or holidays can also have an impact if we want to use our model to predict sales figures. But concept drift can also happen abruptly: If COVID-19 brings global air traffic to a standstill, then our carefully trained models for predicting daily passenger traffic will produce poor results. Or if the sales department launches an Instagram promotion without prior notice and doubles the sales of our vitamin supplement, that’s a great result, but it’s not something our model is good at predicting.

Another challenge is both technical and organizational. In many companies and projects, there is an organizational and personnel separation between the team developing a machine learning model (Data Scientists) and the team bringing the models into production and supporting them (Software Engineers/DevOps Engineers). The data science team spends a lot of time conceptualizing and selecting model architectures, feature engineering, and training the model. When...

Keeping an Eye on AI

Keeping an Eye on AI

What is MLOps, anyway?

Productive challenges

Recommend

如何掌握DDD聚合设计？ - SSENSE

Oppo K10x launch scheduled for September 16, TENAA listing reveals specs

Blue Monday for Blue Origin as rocket bursts into flame

fsl-imx-xll-glibc交叉编译boa服务器

The World's First Hydrogen Trains Started Passenger Service in Germany

The Chevrolet Corvette That's Worth Over 90 Times Its Original Cost Now

Meet the new off-road camper that acts as its own microgrid and can charge your...

吉利公开车窗变色专利可防光源影响视线 - CNMO

中国网络文学作品首次入藏大英图书馆包括《赘婿》等16部

Check out the performance of the iPhone 14 Pro Max on AnTuTu

About Joyk