How to interpret the Area Under the Curve (AUC) stat

One of the questions I often ask in data science interviews is ‘How would you explain the area under the curve statistic to a business person?’. Maybe it is too easy a question even for juniors, as I can’t remember anyone getting it wrong. While there is no correct answer per se, the most logical response is you focus on discussing true positives and false positives, and how the predictive model can be tuned to capture more true positives at the expense of generating more false positives. Only after that do you then even bother to show the ROC curve, and say we calculate the area under the curve (AUC) as a measure of how well the model can discriminate the two classes.

The most recent situation I remember this happened in real life, I actually said to the business rep that the AUC does not directly translate to revenue, but is a good indication that a model is good in an absolute sense (we know others have AUCs typically around 0.7 to 0.8 for this problem, and over 0.5 is better than random). And it is often good in a relative sense – a model with an AUC of 0.8 is typically better than a model with and AUC of 0.75 (although not always, you need to draw the ROC curve and make sure the larger AUC curve dominates the other curve and that they do not cross). So while I try to do my best explaining technical statistical content, I often punt to simpler ‘here are the end outcomes we care about’ (which don’t technically answer the question) as opposed to ‘here is how the sausage is made’ explanations.

One alternative and simple explanation of AUC though for binary models is to take the Harrell’s C index interpretation, which for binary predictions is equivalent to the AUC statistic. So for this statistic you could say something like ‘If I randomly sample a negative case and a positive case, the positive case will have a higher predicted risk {AUC} percent of the time.’ I do like this interpretation, which illustrates that the AUC is all just about rank ordering the predictions, and a more discriminating model will have a higher AUC (although it says nothing about calibration).

How to interpret the Area Under the Curve (AUC) stat | Andrew Wheeler

How to interpret the Area Under the Curve (AUC) stat

Recommend

公共和私人部门合作，意大利时尚行业加强教育投资

音频实录：四位冉冉上升的中国品牌创始人独家分享｜《对话中国品牌主理人》9-12期合辑

Advent of Code 2021 in Kotlin – Win Prizes, Solve Problems, Have Fun!

意大利照明和家具行业出口恢复至疫情前水平

从 Drone 0.8 升级到 1.0

Are great inventions built on rejections?

你说你会Promise？那你解决一下项目中的这五个难题？

乔治·阿玛尼被授予意大利共和国大十字骑士勋章

研发效能数据平台 DevLake 正式开源，连接 DevOps 中的数据孤岛

Python爬虫编程思想（88）：抓取异步数据的原理

About Joyk