0

ML Proofs of Concept Are Hard

 2 years ago
source link: https://knowing.net/posts/2021/05/ml-proof-of-concept-hard/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

ML Proofs of Concept Are Hard

Cartoon showing fragile tower of dependencies

One reason why creating a business case for a Machine Learning project is difficult is that, for virtually any non-trivial task, you’re going to need, from day one of your proof-of-concept, a pretty elaborate data-preparation pipeline and, in most cases, multiple models.

For instance, for a project that I’m considering pursuing, I know I need 3 ML models in a pipeline. Each of the models is a known quantity: it’s just a matter of the considerable work of creating the pipeline and training. And, to really evaluate if the project is worth pursuing, I need an end-to-end proof of concept. It doesn’t have to deal with any corner cases, but it does have to go from input to output.

I just spent the entire weekend yak-shaving my way to the very first elements of the pipeline. Why? Because the biggest lie in Machine Learning is “it’s all Python.” Virtually every framework and non-trivial library depends on a bunch of C/C++ extensions and building them is a #$@&%! pain.

Now, when I’m all done, I should be able to build a Dockerfile that captures the state of my machine, but (a) that’s a manual, error-prone process and (b) that doesn’t make the POC happen any quicker.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK