Tips for cost-effective machine learning project

Spoiler: you don’t need a VM running 24/7 to handle 16 requests a day.

2E73ue2.jpg!web

Street art by Mike Mozart

You just released a machine learning project. It can be a new product at your start-up, a proof of concept for a client demo, or a personal project that enriches your portfolio. You are not looking for a production-grade site; you want to get the job done. For cheap. So that a few users can test your product.

How to cost-effectively make your project available?

This post is a follow-up and an update overthis previous post, where I introduced raplyrics.eu , a text generative web-app using ML to generate rap music lyrics.

This project has been serving punchlines for a year by now. I share here the updated architecture that led us to reduce our cloud provider bill from 50$/month to less than a 1$/month.

I use this project as an example, but the proposals discussed here apply to every similar project with a flexible latency requirement.

What to expect?

First, I describe the architecture of the service and what we want to deliver. Then, I define possible ways to achieve our objective. Finally, I zoom in on how we drastically reduced our compute cost using serverless functions.

Service anatomy

7NreEb3.png!web

First the user fetches the static assets, then locally executes the JS that calls the prediction server to generate lyrics.

First, the user fetches the static files.
Then, the user locally calls the prediction server.
Finally, the server returns the prediction.

The logical separation of concerns remains the same between the initial solution and the new one introduced below. We only update the underlying technology.

Initial solution design

viqMJjE.jpg!web

What paying 600$ a year for a porftolio project feels like — Photo by Jp Valery on Unsplash

When we developed raplyrics, we wanted it to be dead simple. It started as a project built and tested on our machines. Then, we pushed it to the cloud.

Several ways to serve a machine learning model exist today. Back in the day, we wanted to get our hands dirty, so we implemented our serving strategy.

Advice: Don’t develop your own machine learning model serving framework — mature solutions such as tensorflow serving exist. TF Serving has exhaustive documentation, and it will be way more efficient than doing it yourself.

That being said, let’s get back to our project. We separated the front from the back;

The client tier was an Apache Http server
The server tier was a Python flask app running the lyric generation model.

We bought a domain model, deployed our code to an EC2, and we were ready to serve users.

The problem

It’s all fun and games until the free trial expires . After the initial 12 months, the monthly bill rocketed up at ~45/50$ a month for this single project while serving 32 users for September 2019.

The truth is, we have a virtual machine with 2GB Ram, running 24/7 to serve dozens of users.

Updated solution design

After the free trial, it became clear that something was wrong in the way we approached this project.

The typical user of our service knows this website is a personal project; it sets the level of expectation.

The typical user generates a dozen lyrics and then go away.

We know what we want to achieve, serve a two-tier architecture with a front-end handling user input that calls a service to generate lyrics. The front and back are loosely coupled. (Only a reference to the backend endpoint in the front).

What are the possibilities, the possible Hows?

Listing the options

A — Deploy the same project to another Cloud provider who offers free credits. Repeat.

That’s possible. For example, If you come from AWS, the 300$ credits of GCP can keep you running for a while. Maybe you only need this portfolio project or proof of concept for a client for a limited time.

We want to keep our project for some time; option A is not a great fit.

B — Use a static web site for the client tier, serve requests through API call to serverless computing .

What Is Serverless Computing?

Serverless computing is a method of providing backend services on an as-used basis. Servers are still used, but a company that gets backend services from a serverless vendor is charged based on usage, not a fixed amount of bandwidth or number of servers. — From CloudFare

We chose option B, using a static web site and exposing our API on a serverless compute service. The drawback with option B is the added latency at cold starts. A Cold Start is the first run in a while of a serverless computing service; it usually takes longer than W arm Start .

Static website and serverless compute in action

Now that we defined how we want to do it, we can focus on the choice of technology.

Hosting the static page

Multiple static hosting solutions exist. We chose Netlify. It’s easy to get the job done in the least amount of time. Basic hosting, using a custom domain name, and SSL certificate are free on Netlify.

Serving the API with serverless computing

Each cloud provider offers a serverless computing service; we chose Google Cloud and its cloud functions.

Google cloud has a tutorial on how to serve machine learning models through cloud functions. With this tutorial as a baseline, we were able to serve our model with a little refactoring.

Each cloud provider tends to have a slightly different way of handling how they serve cloud functions. Google Cloud also offers Cloud Run, a serverless compute service based on Dockerfile. Using Dockerfiles makes it easier to move the project from one cloud provider to the other.

On the cold start latency

For cold starts, we have to load the model weights (150Mb) from a bucket. Then, the python app loads the weights. In those cases, the response time can reach up to 40s. For warm starts, the response time is usually below 2s. For a portfolio project, we are ok with this cost/latency tradeoff.

We added some UI elements to our front end to make it explicit the first prediction may take some time.

Takeaways

You don’t need a full production scale set up to serve your small project. Aim for the most cost-effective solution.

Latency requirements for portfolio projects are not the same as the ones of production services.
Static website and using API based on serverless computing is a cost-effective solution to serve your project.

Some tricks are required to handle states, loading resources from network efficiently, but the economy on the bill is worth the shot.

Thanks to Cyril for his thoughtful feedback on the article.

Resources

Raplyrics source code is available on GithHub .

Hosting a static website

Hosting a Static Website on Amazon S3, aws doc
Websites for you and your projects, GitHub Pages
Hosting a static website, gcloud doc
Deploy your site in seconds, netlify

Serverless compute services

AWS — Run code without thinking about servers on AWS, aws lambda
Google Cloud — Event-driven serverless compute platform, gcloud functions
Azure — Event-driven serverless compute, Azure functions
Alibaba — A fully hosted and serverless running environment, function compute

What to expect?

Service anatomy

Initial solution design

The problem

Updated solution design

Listing the options

What Is Serverless Computing?

Static website and serverless compute in action

Hosting the static page

Serving the API with serverless computing

On the cold start latency

Takeaways

Resources

Hosting a static website

Serverless compute services

Recommend

4 cool new projects to try in COPR for October 2019

Deep learning: Saving rainforests with TensorFlow

Building a Production-Level ETL Pipeline Platform Using Apache Airflow

语义意图在语音机器人中的应用

Histogram Equalization — a simple way to improve the contrast of your image

深度资讯 | 亚马逊季度利润两年来首次下降，但这并不危险

9点1氪 | 小米、美团纳入港股通名单；滴滴招募顺路接单产品体验官；网易有道上市破发

尿检即可准确预测高危宫颈癌？「诺辉健康」探索肿瘤居家早筛更进一步

「快手商业化」的150亿元答卷

出海创投周报 | 拉美独角兽 Rappi 进军哥斯达黎加；Gojek 表示将为双重上市做准备

About Joyk