4

Keeping an Eye on AI

 1 year ago
source link: https://devm.io/machine-learning/machine-learning-monitoring
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Monitoring for machine learning systems

Keeping an Eye on AI

07. Sep 2022


Our machine learning model is trained and finally running in production. But that was the easy part. Now, the real challenge is reliably running our machine learning system in production. For this, monitoring systems are essential. But while monitoring machine learning models, we must consider some challenges that go beyond classic DevOps metrics.

We've all been there: we've spent weeks or even months working on our ML model. We collected and processed data, tested different model architectures, and spent a lot of time fine-tuning our model’s hyperparameters. Our model is ready! Maybe we need a few more tweaks to improve performance more, but it's ready for the real world. Finally, we put our model into production, and sit back and relax. Three weeks later, we get an angry call from our customer because our model makes predictions that don’t have anything to do with reality. A look at the log reveals no errors. In fact, everything still looks good.

However, since we haven’t established continuous monitoring for our model, we don’t know if and when our model’s predictions change. We have to hope that they’ll always be just as good. But if our infrastructure is made out of only tape and hope, then we’ll find lots of errors only in production.

What is MLOps, anyway?

We want to create infrastructures and processes that combine the development of machine learning systems with the system’s operation. That’s the goal that MLOps is chasing. This is closely interwoven with DevOps’ goals and definition. But the goal here isn’t just developing and operating software systems, but developing and operating machine learning systems.

As a data scientist, when I’m confronted with these questions, I have to ask myself why I should worry about them at all. Technically, my goal is just to train the best possible ML model. While this is a worthy goal, I can’t lose sight of the system’s overall context and the company I’m operating out of. About 90% of all ML models are never deployed in a production environment [1], [2]. This means they never reach a user or customer. Provocatively speaking, 90% of machine learning projects are useless for our business. Of course, in the end, a model is only useful if it generates added value for my users or processes.

Besides, when developing a machine learning system, I always keep the following quote from Andrew Ng in mind: “You deployed your model in production? Congratulations. You’re halfway done with your project.” [3]

Productive challenges

Our machine learning project doesn’t end when we bring our first model into a production environment. After all, even a model that delivered excellent values in training and testing faces many challenges once it arrives in production. Perhaps the most well-known challenge is changes in the data or data distribution that our model receives as input. A whole range of events can trigger this, but they’re often grouped together under the term “concept drift”. The data set used to train a model represents an excerpt of our model’s overall reality. This is one of the reasons why collecting as much data as we can is so critical for a well-functioning, robust model. The more data that’s available to the model, the more complete the slice of reality it can represent. But the training data is always just a snapshot of reality, while the world around it is constantly changing. If our model isn’t trained with new data, then it has no way to update its outdated basic assumptions about reality. This leads to the model’s performance declining.

How does concept drift happen? Data can gradually change. For example, sensors get less accurate over a long period of time due to wear and tear and show increasing deviations from the actual measured value. Slow changes in customer preferences is another example. One striking example is a model that makes recommendations for suitable products in a fashion setting. If it isn’t updated and keeps recommending products from the previous season to customers, then customer satisfaction in our shop system will significantly decrease. Recurring events such as seasons or holidays can also have an impact if we want to use our model to predict sales figures. But concept drift can also happen abruptly: If COVID-19 brings global air traffic to a standstill, then our carefully trained models for predicting daily passenger traffic will produce poor results. Or if the sales department launches an Instagram promotion without prior notice and doubles the sales of our vitamin supplement, that’s a great result, but it’s not something our model is good at predicting.

Another challenge is both technical and organizational. In many companies and projects, there is an organizational and personnel separation between the team developing a machine learning model (Data Scientists) and the team bringing the models into production and supporting them (Software Engineers/DevOps Engineers). The data science team spends a lot of time conceptualizing and selecting model architectures, feature engineering, and training the model. When...


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK