SageMaker Clarify is the most important announcement of re:Invent 2020

Dec 11, 2020 4 Minute Read

We are in the second week of AWS re:Invent 2020, and this is the first year there’s been a separate keynote for machine learning! As an AWS Machine Learning Hero, I could not wait to watch the keynote to learn more about the new machine learning services being rolled out.

And I literally cheered when Swami Sivasubramanian announced Amazon SageMaker Clarify, which expands the capabilities of SageMaker by detecting and mitigating bias in datasets and models. I believe Amazon SageMaker Clarify is the most important announcement out of AWS re:Invent this year given the widespread societal impact of artificial intelligence and more specifically, machine learning.

Why is bias detection so important in machine learning?

SageMaker has done well with helping machine learning practitioners and data scientists prepare datasets, build and train custom models, and deploy and monitor models in production. However, there has been no easy way to detect bias.

Bias in machine learning has been at the forefront of much discussion lately. We’ve seen cutting-edge technology like facial recognition (that has machine learning at its core) be banned due to bias; bias surfaces when facial recognition systems accurately identify everyone except people of color. In order for society to fully realize the benefits of machine learning, bias must be detected and mitigated earlier in the process – before a machine learning model makes it to production.

I’ve always been vigilant to test my models for bias but it’s super exciting to now have a scalable and repeatable way to detect and mitigate bias. Amazon SageMaker Clarify allows you to evaluate bias at every stage of the development process for a machine learning model – during data analysis, after training, and during inference.

Detecting bias prior to training

Sometimes when working with a new dataset, it takes time to build the domain level knowledge needed in order to detect anomalies and imbalances that could lead to bias. Amazon SageMaker Clarify promises to help by detecting bias in datasets prior to training. This is super exciting because a heavily biased dataset should never be used to train a model in the first place! Don’t forget when training machine learning models, “garbage in, garbage out”!

In order for Amazon SageMaker Clarify to detect bias in your dataset, you’ll need to upload data that is already pre-processed and cleaned (using a tool like Amazon SageMaker DataWrangler). Amazon SageMaker Clarify also promises to detect bias in models after training. The most important bias detection feature, Accuracy Difference (AD), detects if the model is more accurate for one group over another! This feature should be used on all facial recognition models.

Transparency about predictions

But Amazon SageMaker Clarify doesn’t stop there! In the past, it’s been hard for me to explain why my models make a certain prediction during inference. Amazon SageMaker Clarify promises to solve this by helping me to “explain how feature values contribute to the predicted outcome, both for the model overall and for individual predictions”. This level of transparency is a true game changer and helps to build trust with those that use my models.

Overall, I’m super excited about the future of machine learning now that bias detection is at the forefront! I can’t wait to re-train my existing models using Amazon SageMaker Clarify and share my lessons learned with you! If you’re interested in exploring other AWS re:Invent machine learning sessions (outside of Swami’s keynote), check out my Machine Learning Hero Guide.

Drop in your email address to get a TL;DR summary of each day of re:Invent from ACG.

SageMaker Clarify is the most important announcement of re:Invent 2020

SageMaker Clarify is the most important announcement of re:Invent 2020

Why is bias detection so important in machine learning?

Detecting bias prior to training

Transparency about predictions

Recommend

Github GitHub - Alois-xx/MemAnalyzer: A command line memory analysis tool for ma...

Breaking Down The Werner Vogels Keynote - re:Invent 2020 | A Cloud Guru

vno: a vue / deno love story

From Intern to Full-Time Software Engineer

Writing a Kubernetes CRD Controller in Rust

EFF's Response to Social Media Companies' Decisions to Block President Trump’s A...

HTML5 Scheduler: Splitting an Event

HTML5 Scheduler: Hiding Rows without Events

AWS Proton is Conway's Law-as-a-Service | A Cloud Guru

Github GitHub - skbkontur/GroBuf: Fast binary serializer

About Joyk