24

Tips for cost-effective machine learning project

 4 years ago
source link: https://www.tuicool.com/articles/ZfaMRbM
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Spoiler: you don’t need a VM running 24/7 to handle 16 requests a day.

2E73ue2.jpg!web

Street art by Mike Mozart

You just released a machine learning project. It can be a new product at your start-up, a proof of concept for a client demo, or a personal project that enriches your portfolio. You are not looking for a production-grade site; you want to get the job done. For cheap. So that a few users can test your product.

How to cost-effectively make your project available?

This post is a follow-up and an update overthis previous post, where I introduced raplyrics.eu , a text generative web-app using ML to generate rap music lyrics.

This project has been serving punchlines for a year by now. I share here the updated architecture that led us to reduce our cloud provider bill from 50$/month to less than a 1$/month.

I use this project as an example, but the proposals discussed here apply to every similar project with a flexible latency requirement.

What to expect?

First, I describe the architecture of the service and what we want to deliver. Then, I define possible ways to achieve our objective. Finally, I zoom in on how we drastically reduced our compute cost using serverless functions.

Service anatomy

7NreEb3.png!web

First the user fetches the static assets, then locally executes the JS that calls the prediction server to generate lyrics.
  1. First, the user fetches the static files.
  2. Then, the user locally calls the prediction server.
  3. Finally, the server returns the prediction.

The logical separation of concerns remains the same between the initial solution and the new one introduced below. We only update the underlying technology.

Initial solution design

viqMJjE.jpg!web

What paying 600$ a year for a porftolio project feels like — Photo by Jp Valery on Unsplash

When we developed raplyrics, we wanted it to be dead simple. It started as a project built and tested on our machines. Then, we pushed it to the cloud.

Several ways to serve a machine learning model exist today. Back in the day, we wanted to get our hands dirty, so we implemented our serving strategy.

Advice: Don’t develop your own machine learning model serving framework — mature solutions such as tensorflow serving exist. TF Serving has exhaustive documentation, and it will be way more efficient than doing it yourself.

That being said, let’s get back to our project. We separated the front from the back;

  1. The client tier was an Apache Http server
  2. The server tier was a Python flask app running the lyric generation model.

We bought a domain model, deployed our code to an EC2, and we were ready to serve users.

The problem

It’s all fun and games until the free trial expires . After the initial 12 months, the monthly bill rocketed up at ~45/50$ a month for this single project while serving 32 users for September 2019.

The truth is, we have a virtual machine with 2GB Ram, running 24/7 to serve dozens of users.

Updated solution design

After the free trial, it became clear that something was wrong in the way we approached this project.

The typical user of our service knows this website is a personal project; it sets the level of expectation.

The typical user generates a dozen lyrics and then go away.

We know what we want to achieve, serve a two-tier architecture with a front-end handling user input that calls a service to generate lyrics. The front and back are loosely coupled. (Only a reference to the backend endpoint in the front).

What are the possibilities, the possible Hows?

Listing the options

  • A — Deploy the same project to another Cloud provider who offers free credits. Repeat.

That’s possible. For example, If you come from AWS, the 300$ credits of GCP can keep you running for a while. Maybe you only need this portfolio project or proof of concept for a client for a limited time.

We want to keep our project for some time; option A is not a great fit.

  • B — Use a static web site for the client tier, serve requests through API call to serverless computing .

What Is Serverless Computing?

Serverless computing is a method of providing backend services on an as-used basis. Servers are still used, but a company that gets backend services from a serverless vendor is charged based on usage, not a fixed amount of bandwidth or number of servers. — From CloudFare

We chose option B, using a static web site and exposing our API on a serverless compute service. The drawback with option B is the added latency at cold starts. A Cold Start is the first run in a while of a serverless computing service; it usually takes longer than W arm Start .

Static website and serverless compute in action

Now that we defined how we want to do it, we can focus on the choice of technology.

Hosting the static page

Multiple static hosting solutions exist. We chose Netlify. It’s easy to get the job done in the least amount of time. Basic hosting, using a custom domain name, and SSL certificate are free on Netlify.

Serving the API with serverless computing

Each cloud provider offers a serverless computing service; we chose Google Cloud and its cloud functions.

Google cloud has a tutorial on how to serve machine learning models through cloud functions. With this tutorial as a baseline, we were able to serve our model with a little refactoring.

Each cloud provider tends to have a slightly different way of handling how they serve cloud functions. Google Cloud also offers Cloud Run, a serverless compute service based on Dockerfile. Using Dockerfiles makes it easier to move the project from one cloud provider to the other.

On the cold start latency

For cold starts, we have to load the model weights (150Mb) from a bucket. Then, the python app loads the weights. In those cases, the response time can reach up to 40s. For warm starts, the response time is usually below 2s. For a portfolio project, we are ok with this cost/latency tradeoff.

We added some UI elements to our front end to make it explicit the first prediction may take some time.

Takeaways

You don’t need a full production scale set up to serve your small project. Aim for the most cost-effective solution.

  • Latency requirements for portfolio projects are not the same as the ones of production services.
  • Static website and using API based on serverless computing is a cost-effective solution to serve your project.

Some tricks are required to handle states, loading resources from network efficiently, but the economy on the bill is worth the shot.

Thanks to Cyril for his thoughtful feedback on the article.

Resources

Raplyrics source code is available on GithHub .

Hosting a static website

Serverless compute services


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK