Tracking ML Experiments using MLflow

A demonstration of how MLflow can improve your ML modelling experience

Jul 13 ·9min read

Introduction

If you’re familiar with building machine learning models, either at work or as a hobby; you’ve probably come across the situation where you’ve built tons of different models, having different code bases, and tons of graphs or notebooks and metrics that you need to keep track of as you optimize your code and tune your model to move up that accuracy ladder.

You’re not alone. [1]

I think over the years as I’ve practiced data science I’ve came across a lot of ways that people tend to deal with this problem. My go to method would usually be an excel sheet. It’s easy to use, easy to visualize and easy to convey to external parties.

zUF73mn.png!web

Tracking experiments on a spreadsheet

I’ve also seen people that tries to programmatically store the metrics in database tables. But those can get quite messy ( not to say that excel sheets aren’t messy in and of itself, but if I need to run a few commands just to add in a new metric — I’d say it’s rather unsustainable in the long run )

Things get a little more trickier once you have multiple people working on the same machine learning problem. Some ground rules would need to be set prior to the project execution else tracking will become cumbersome. Mature AI teams would have most likely sorted this problem out, but I’d imagine early stage startups would have some trouble dealing with this coordination work ( if they can afford having multiple data scientist working on the same thing that is ).

It is such a big problem, that it has spawned a few companies dedicated to solving JUST this particular issue. Some of the ones that I know of are:

Neptune ML: https://neptune.ml/

2. Weights and Biases: https://www.wandb.com/

3. Comet ML: https://www.comet.ml/

For the sake of brevity, I won’t be covering what features they actually have, how they works nor their pricing models. Some focuses more on deep learning based experiments while others cover a wider range. The point I’m trying to highlight here is that the problem is big enough that people actually PAY other people for some sort of a solution.

Notable mentions would be TensorBoard, but it seems to be made explicitly for Tensorflow and deep learning ( I’m not a heavy TensorBoard user myself, feel free to share with me what you know in the comment section for the benefit of everyone ).

MLflow

Another option would be MLflow. Started by Databricks, it’s basically an open source platform for the the entire machine learning lifecycle (taken verbatim from their website).

ryIbmye.png!web

What they mean by covering the entire ML lifecycle [2]

Which brings us to the focus of this article — tracking metrics.

My MLflow Environment

In my current workplace, one of the heavily used tool that we have at our disposal is Databricks. In recent months, they’ve added MLflow as a feature. As someone who’s been dying to learn how to better keep track of things — it’s the perfect opportunity to try it out. The context of my walkthrough of MLflow tracking will be based on this (aka, a Databricks user).

For non-Databricks users, MLflow can be setup separately and connected to your environment. The same functionality should exist. Checkout their documentation for more details.

The Experiment

To demonstrate how I’m currently using MLflow at work, and why I find it useful — I’ll be using it on a Kaggle kernel [3] where the author does some data analysis and forecasting experiments using cryptocurrency dataset. Some minor changes to change the original code taken from [3] into Databricks (mostly to be able to visualize the graphs), but essentially it’s pretty much the same.

Note: I’ve just started to use MLflow recently, so it’s highly likely that I’ll miss out on some things.

The notebook that I’m using for this post can be referred to here . 2 versions of the notebook are available — “before” denotes the original ported Kaggle kernel to Databricks, while “after” contains the additional code used for tracking.

To start an experiment with MLflow, one will first need to use the mlflow.set_experiment command, followed by the path where the experiment file will be stored.

Next, you can start to think about what do you want to keep track in your analysis/experiment. MLflow categorizes these into 3 main categories:

mlflow.log_param()
mlflow.log_metric()
mlflow.log_artifact()

MLflow Tracking is organized around the concept of runs , which are executions of some piece of data science code [4] . (moving forward, we’ll denote this concept of run by using an italicized ` run` )

Other things that are/can be tracked at every run are code version, start and end time, and the source file. Refer to the documentation for more details on this.

Tracking Parameters and Metrics

In cell 37 of the notebook, the author tries to iteratively explore different combinations of parameters to get the best fit on an ARIMA model to our data. Let’s use this section of the code and track the parameters being used and metrics being generated.

To start our run , we use the mlflow.start_run(). We then log the qs and ps parameters and model.aic metric to MLFlow. Running the above code will give use the following result in the MLFLow UI.

nAbUVbU.png!web

Each row of result is generated for each run.

If we are to click into one of those run , we’ll be taken to the following window.

i6ZrmqM.png!web

Detailed page for each MLflow run

As can be seen above, we’ve only tracked a limited amount of parameters and metric per run . There was no artifact being recorded nor tags. We can opt to add in our own notes if we have any. Other things that are recorded are the Date, the User (in my case it’s my Databricks ID), the Run ID, Duration, and Source.

Runs Are Committed To Git

The latter is something that I find really useful to have, since it actually commits the notebook that is being run into Git. I’m currently not sure how this translates to the self setup version of MLflow, but in my case clicking on the source link would essentially bring me back to the committed version of the notebook in Databricks (below).

Opening the Revision History reveals that this notebook was committed by MLFlow (ie. “Taken by MLflow”)

Now let’s try something else.

Storing Plots and Other Files

Following the earlier example, we’ve built multiple models in order to determine the best parameters that we should be using for our ARIMA model. In our next example, we’re going to store a few artifacts which are related to the model; namely the model summary and some plots.

Let’s do this for our SARIMAX model.

As can be observed from the above code, logging in artifacts require a bit more work than usual. Instead of being able to directly save the plots into MLflow, the file will first have to be stored somewhere first before it could be sent to MLflow’s S3 bucket.

In my case, the files are first saved into the local Databricks cluster, and later copied over to MLflow’s S3 bucket (which is also within Databricks in this case. Alternatively one can also define an external S3 bucket for each experiment and store all related files there.)

For reference, below are how the stored artifacts are displayed in the UI.

muARZjy.png!web

Artifact 1

3IRvyyF.png!web

Artifact 2

vIfuYvF.png!web

Artifact 3

For me this rather simplify and standardizes the collection of plots and summaries that one would normally just store in a notebook. While having them exist in notebook is OK, the fact notebooks can be run by cell and not run in it’s entirety from top to bottom means that certain plots can in fact be generated as a result of an operation that happens much later in time (or one that could’ve been removed altogether during development). Since the plots stored in MLflow are tied back to a committed code in Git, we now have a much better way to ensure that experiments and their result are being documented in a reproducible manner.

Saving the Model

MLflow also allows you to save the model being trained. This can be done either by logging it in as an artifact to MLflow (via log_model() ) or directly to a local file system (via save_model ).

Backtesting

One of my favourite feature for MLflow is it’s metrics visualization. To best highlight it, I’ll be using the final bit of the kernel [3] and change it to also w monitor the performance of the best model when deployed to run over a period of time (ie. sliding window validation).

In the above example we establish a time window of 3 months and evaluate the RMSE score of the model over a time period of 10 sliding windows, where each move forward is of 1 month. At each evaluation, we also log the comparison plot to MLflow for future reference and verification.

Rr6zUfJ.png!web

Plot are captured across multiple sliding window

Diving into the RMSE metrics, we can see how it performed over time as it slides across the time horizon over 10 time steps. We can see that the model is generally stable for some time until spikes at the 5th iteration. This is rather expected if we refer back to how volatile the Bitcoin price is in the above diagram.

A demonstration of how MLflow can improve your ML modelling experience

Introduction

MLflow

My MLflow Environment

The Experiment

Tracking Parameters and Metrics

Runs Are Committed To Git

Storing Plots and Other Files

Saving the Model

Backtesting

Recommend

Class Model Visualization for CNNs

Advanced Histogram Using Python

挖洞经验 | Panda反病毒软件本地提权漏洞分析（CVE-2019-12042）

Cloudflare 7.2 全球瘫痪罪魁祸首：.(?:.=.*)

明明拼智力的编程，为何还要拼年龄？

ChromeOS 77.0.3849.0 Fixes Android Application Connectivity Issues and Restores...

FriendZone: Hack The Box Walkthrough

2019 年新开源的 JavaScript 引擎 QuickJS

Miteru：实验性网络钓鱼工具包检测工具

aelf马昊伯：驳王小川“区块链有三个不可解决的问题”

About Joyk