21

Why we deploy machine learning models with Go — not Python

 3 years ago
source link: https://towardsdatascience.com/why-we-deploy-machine-learning-models-with-go-not-python-a4e35ec16deb?gi=328d3d5daf3d
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Why we deploy machine learning models with Go — not Python

There’s more to production machine learning than Python scripts

Apr 14 ·5min read

q6Zjyu6.png!web

Source: Free Gopher Pack

At this point, it should be a surprise to no one that Python is the most popular language for machine learning. While ML frameworks use languages like CUDA C/C++ for actual computation, they all offer a Python interface. As a result, most ML practitioners work in Python.

So, naturally, the codebase for Cortex — our ML infrastructure — is 88.3% Go.

uqmMfeV.png!web

Source: Cortex GitHub

Deploying models at scale is different than writing Python scripts that call PyTorch and TensorFlow functions. To actually run a production machine learning API at scale, we need our infrastructure to do things like:

  • Autoscaling, so that traffic fluctuations don’t break our APIs (and our AWS stays manageable).
  • API management, to handle multiple deployments.
  • Rolling updates, so that we can update models while still serving requests.

We built Cortex to provide this functionality, and we decided to write it in Go for a few reasons:

1. The infrastructure community has already embraced Go

We are software engineers, not data scientists, by background. We got into ML because we wanted to build features like Gmail’s Smart Compose, not because we were fascinated by backpropagation (although it is admittedly cool). We wanted a simple tool that would take a trained model and automatically implement all the infra features needed—like reproducible deployments, scalable request handling, automated monitoring, etc—to deploy it as an API.

And while that all-in-one model-to-microservice platform didn’t exist yet, we’d implemented each of those features in normal software before. We knew what tools were right for the job—and what language they were written in.

There’s a reason the teams that build tools like Kubernetes, Docker, and Terraform use Go. It’s fast. It handles concurrency well. It compiles down to a single binary. In this way, choosing Go was relatively low-risk for us. Other teams had already used it to solve similar challenges.

Additionally, being written in Go makes contributing easier for infrastructure engineers, who are likely already familiar with the language.

2. Go solves our problems related to concurrency and scheduling

Managing a deployment requires many services to be running both concurrently and on a precise schedule. Thankfully, Goroutines, channels, and Go’s built-in timers and tickers provide an elegant solution for concurrency and scheduling.

At a high level, a Goroutine is an otherwise normal function that Go runs concurrently by executing on a virtual, independent thread. Multiple Goroutines can fit on a single OS thread. Channels allow Goroutines to share data, while timers and tickers allow us to schedule Goroutines.

We use Goroutines to implement concurrency when needed — like when Cortex needs to upload multiple files to S3 and running them in parallel will save time — or to keep a potentially long-running function, like streaming logs from CloudWatch, from blocking the main thread.

Additionally, we use timers and tickers within Goroutines to run Cortex’s autoscaler. I’ve written up a full report on how we implement replica-level autoscaling in Cortex, but the short version is that Cortex counts the number of queued and inflight requests, calculates how many concurrent requests each replica should be handling, and scales appropriately.

To do this, Cortex’s monitoring functions need to execute at consistent intervals. Go’s scheduler makes sure monitoring happens when it is supposed to, and Goroutines allow each monitoring function to execute concurrently and independently for each API.

Implementing all of this functionality in Python may be doable with tools like asyncio, but the fact that Go makes it so easy is a boon for us.

3. Building a cross-platform CLI is easier in Go

Our CLI deploys models and manages APIs:

Jjqua2m.gif

Source: Cortex GitHub

We want the CLI to work on both Linux and Mac. Originally, we tried writing it in Python, but users consistently had trouble getting it to work in different environments. When we rebuilt the CLI in Go, we were able to compile it down to a single binary, which could be distributed across platforms without much extra engineering effort on our part.

The performance benefits of a compiled Go binary versus an interpreted language are also significant. According to the computer benchmarks game, Go is dramatically faster than Python .

It’s not coincidental that many other infrastructure CLIs — eksctl, kops, and the Helm client, to name a few — are written in Go.

4. Go lends itself to reliable infrastructure

As a final point, Go helps with Cortex’s most important feature: reliability.

Reliability is obviously important in all software, but it is absolutely critical with inference infrastructure. A bug in Cortex could seriously run up the inference bill.

While we apply a thorough testing process to every release, Go’s static-typing and compilation step provide an initial defense against errors. If there’s a serious bug, there’s a good chance it will get caught during compilation. With a small team, this is very helpful.

Go’s unforgiving nature may make it a bit more painful to get started with than Python, but these internal guardrails act as a sort of first line of defense for us, helping us avoid silly type errors.

Python for scripting, Go for infrastructure

We still love Python, and it has its place within Cortex, specifically around model inference.

Cortex supports Python for model serving scripts. We write Python that loads models into memory, conducts pre/post inference processing, and serves requests. However, even that Python code is packaged up into Docker containers, which are orchestrated by code that is written in Go.

Python will (and should) remain the most popular language for data science and machine learning engineering. However, when it comes to machine learning infrastructure, we’re happy with Go.

Are you an engineer interested in Go and machine learning? If so, consider contributing to Cortex !


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK