18

If you can call a function, you can use machine learning

 4 years ago
source link: https://towardsdatascience.com/if-you-can-call-a-function-you-can-use-machine-learning-dccaa2938e81?gi=9b30070d4d39
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

At least, you can build software with machine learning

zyAfMrj.jpg!web

Mar 11 ·5min read

jMJjeqJ.png!web

Every few months, a new benchmark seems to be set by machine learning research teams. Looking at natural language processing, for example:

  • In February 2019, OpenAI announced GPT-2, a state-of-the-art NLP model trained with 1.5 billion parameters.
  • By September, Salesforce had released an even bigger NLP model in CTRL, which had 1.63 billion parameters.
  • In January 2020, Google then released an even bigger NLP model called Meena, which was supposed to be the best conversational agent yet.
  • One month after Meena set new records, Microsoft released Turing-NLG, an insane 17 billion parameter NLP model.

That’s how quickly the state-of-the-art moves in machine learning research. This frenetic pace, however, has not neatly translated to a surge in new ML applications.

The reason for this is that while much energy has been spent over the last decade on machine learning research, less has gone into building tools and abstractions that make it easy for engineers to build ML applications.

In order for a new generation of ML applications to be unlocked, state-of-the-art machine learning models need to become as accessible to software engineers as any other library—and fortunately, we’re starting to see that happen.

Abstraction is how software gets built

There is hardly a web app that can exist without authentication, and almost all authentication schemes involve the hashing of passwords. Most web developers (ideally) understand why password hashing is necessary from a security perspective, and have experience implementing it.

But how many of those web developers write their own hashing functions?

Virtually none is the answer. Instead, they use a hashing library to abstract away the underlying cryptography, and focus on building their app. For example, instead of writing hundreds of lines of code to hash a password, an engineer writes:

import bcrypthashed_pass = bcrypt.hashpw(pass, salt)

All of this probably feels obvious—of course, you don’t have to write a hashing function from scratch—but as the production machine learning ecosystem is young, this layer of abstraction is still being defined.

As a result, there is still a disconnect between advances in machine learning research and the ability of software engineers to turn those advances into products.

But that is changing.

The ML community is bridging the abstraction gap

Returning to the hashing example, what libraries like bcrypt did was give software engineers an interface that allowed them to treat complex hashing operations as simple, high-level functions.

The ML community is starting to do the same with prediction serving. We are seeing more projects dedicated to building an interface such that software engineers can treat a trained model as a predict() function. Instead of treating GPT-2, for example, as a highly complex transformer model, engineers can conceptualize it as a just a GPT2_predict() function that takes an input string, and returns an output string.

One of the most popular ways to build this interface is to deploy a trained model as a microservice. The predict() function that engineers interface with then becomes a simple wrapper around the model’s API.

There are a few popular open source platforms like TF Serving and ONNX Runtime that provide an easy interface for generating predictions from a model, but deploying a model to the cloud still presents particular infrastructure challenges:

  • Models can be huge. GPT-2, OpenAI’s popular NLP model, is over 5 GBs.
  • Predictions are computationally expensive . Even with GPUs, many models take seconds—even minutes—to generate a prediction.
  • Concurrency is a pain. A single prediction can fully utilize an instance, meaning instances need to autoscale aggressively to handle increases in traffic.

To handle these challenges, an engineer would need to wrangle tools like Flask, Docker, Kubernetes, and whatever APIs their cloud platform provides. They would have to become versed in a ML-specific DevOps, in other words.

There are several projects working on providing a layer of abstraction over this infrastructure. For example, Cortex , a project I contribute to, is focused on abstracting all of this away by converting trained models into scalable APIs with a CLI and a config file:

zMRzqyr.gif

Source: Cortex GitHub

As projects like these mature, machine learning comes closer to just being a library engineers import to build their application, and as a consequence, we come closer to seeing a flood of new ML-powered software.

This trend isn’t pure speculation, either. It’s something we’re already seeing.

ML-native apps are machine learning’s CRUD apps

In understanding the impact that closing the abstraction gap will have, it’s useful to look at how abstractions unlocked a new generation of web apps.

When you look at web applications, how many of them at their core are “just” simple CRUD apps? How many applications primarily store and modify user data, operations abstracted away by ORMs, and display that data to authorized users, a process which is also abstracted away by hashing and authentication libraries?

A similar dynamic seems to be happening within machine learning. More and more, startups are launching products whose core functionality—or at least, a major part of their core functionality—is to serve predictions from a trained model.

We refer to these products as ML-native, and in aprevious article, I put together a list of ML-native startups by looking just at computer vision products:

Take computer vision models:

Ezra , Zebra Medical , and Arterys are all startups that use computer vision models to analyze MRIs for anomalies.

SkinVision , SkinIQ , and TroveSkin all use your phone’s camera and a computer vision model to analyze your skin for everything from acne to melanoma.

Comma.ai , Pony.ai , and Phantom.ai all use computer vision models to help cars navigate autonomously.

Actuate (formerly Aegis AI), Athena Security , and Synapse Technology all use computer vision models to detect weapons in video footage.

As the abstraction gap continues to close in machine learning, there will be an explosion of ML-native products, similar to the wave of CRUD apps that hit the market as web frameworks made it easier for engineers to build them.

Machine learning’s future relies on both researchers and engineers

Going back to the hashing parallel one last time (I promise), it’s important to note how the efforts of researchers and software engineers interplay to push the field forward.

In the web development world, most of the popular abstractions for authentication—be it built-in functionality of frameworks like Django and Rails, or dedicated projects like Passport.js—are built on bcrypt . The people who designed bcrypt , Niels Provos and David Mazières, are both security researchers by profession.

In this example, the work done by dedicated researchers pushes the state-of-the-art forward, and is then wrapped up in a layer of abstraction that makes this new frontier available to engineers, unlocking a new wave of software.

The same dynamic has emerged within machine learning. Every time OpenAI, Google, Microsoft, or some other ML research team releases a new model, they’re really releasing new functionality that—given the right abstractions— engineers can use to build new products.

In other words, data scientists and researchers are focused on the fundamentals of machine learning, conducting experiments to train models to do things we’ve never seen before—and to engineers, these trained models will become just another library they import to build new products.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK