ML Proofs of Concept Are Hard

Cartoon showing fragile tower of dependencies

One reason why creating a business case for a Machine Learning project is difficult is that, for virtually any non-trivial task, you’re going to need, from day one of your proof-of-concept, a pretty elaborate data-preparation pipeline and, in most cases, multiple models.

For instance, for a project that I’m considering pursuing, I know I need 3 ML models in a pipeline. Each of the models is a known quantity: it’s just a matter of the considerable work of creating the pipeline and training. And, to really evaluate if the project is worth pursuing, I need an end-to-end proof of concept. It doesn’t have to deal with any corner cases, but it does have to go from input to output.

I just spent the entire weekend yak-shaving my way to the very first elements of the pipeline. Why? Because the biggest lie in Machine Learning is “it’s all Python.” Virtually every framework and non-trivial library depends on a bunch of C/C++ extensions and building them is a #$@&%! pain.

Now, when I’m all done, I should be able to build a Dockerfile that captures the state of my machine, but (a) that’s a manual, error-prone process and (b) that doesn’t make the POC happen any quicker.

ML Proofs of Concept Are Hard

ML Proofs of Concept Are Hard

Recommend

How To Manage Growing Pains During Rapid Team Growth

How Microsoft approaches hybrid work: A new guide to help our customers

IoT at Microsoft Build 2021 - Microsoft Tech Community

Sum vs SubTotal

Agile planning with a DevOps platform | GitLab

Microsoft BUILD 2021: Digital Swag Downloads

MongoDB 效能調校紀錄

Github GitHub - archtechx/airwire: A lightweight full-stack component layer that...

Better Done Than Perfect. Building Relationships Using Email Automation with Liz...

5 Warning Signs You Need Automation of Performance Management Process

About Joyk