Too relaxed? Naive Bayes does not improve recidivism forecasting in the NIJ challenge

So the paper Improving Recidivism Forecasting With a Relaxed Naïve Bayes Classifier (Lee et al., 2023), recently published in Crime & Delinquency, has incorrect results. Note I am not sandbagging on the authors, I reviewed this paper for JQC and Journal of Criminal Justice, so I have given the authors this same feedback already (multiple times!). The authors however did not correct their results, and just journal shopped and published the wrong findings.

I have replication code here to review. (Note I initially made a mistake in my code replication, reversed calculating p(x|y), I calculated p(y|x) by accident, see this older code I shared in my prior reviews, but I was still correct in my assertion that Lee’s results were wrong.)

So the main thing that made me go to this effort, the authors report unbelieveable results. They report Brier Scores for Females (Round 1) of 0.104 and for males 0.159 – these scores blow the competition out of the water. The leaderboard was 0.15 for Females and 0.19 for males. Note how I don’t list to the third decimal – the difference between the teams you needed to go down that low. Lee also reports unbelievably low Brier scores for the alternative logit and random forest models – their results just on their face are not believable.

If the authors really believe their results this kind of sucks for them they did not participate in the NIJ challenge, they would have won more than $150,000! But I am pretty sure they are miscalculating their Brier scores somewhere. My replication code shows them in the same ballpark as everyone else, but they would not have made the leaderboard. Here are my estimates of what their Brier scores should be reported as (the Brier column below in the two tables):

AIL4fc_8oGhCOztsGRXFrAt0z7PGnlUu7NkaL5u1Lq7QPj_TC7Us2HQtOnE6zm2STZjn0LXxD7C4csvnHMOVQL47zrgbh4NwO2_5jO1g5kwBQIc_jmC43Fh1TesoORatWAM_k6NZxjmXPtQt5U34_C5VQNCk=w995-h418-s-no?authuser=0

Folks can go and look at their paper and their set of spreadsheets in the supplemental material – I have posted not many more than 50 lines of (non-comment) python code that replicates their regression model coefficients and shows their Brier scores are wrong though. (And subsequently any points Lee et al. 2023 make about fairness are thus wrong as well.)

NIJ probably released papers at some point, but if you want to see other folks discussion, there is Circo & Wheeler (2022) (for mine and Gio’s results for team MCHawks), and Mohler & Porter (2021) for team PASDA.

I may put in the slate sometime to discuss naive Bayes (and other categorical encoding schemes). It is not a bad idea for data with many categories, but for this NIJ data there just isn’t that much to squeeze out of the data. So any future work will be unlikely to dramatically improve upon the competition results (it is difficult to overfit this data). Again given my analysis here, I am pretty sure a valid data analysis (not peeking) at best will “beat” the competition results in the 3rd decimal place (if they can improve at all).

Now part of the authors argument is that this method (relaxed naive Bayes) results in simpler interpretations. Typically people interpret “simple” models in terms of the end results, e.g. having a simple checklist of integer weights. The more I deal with predictive models though, I think this is maybe misguided. You could also interpret “simple” in terms of the code used for how someone derived the weights (and evaluated the final metrics). This is important when auditing code that others have written, as you will ultimately take the code and apply it to your data.

I think this “simpler to estimate the same results” is probably more important for scientists and outside groups wanting to verify the integrity of any particular machine learning model than “simple end result weights”. Otherwise scientists can make up results and say my method is better. Which is simpler I suppose, but misses the boat a bit in terms of why we want simple models to begin with.

References

Circo, G. M., & Wheeler, A. P. (2022). An Open Source Replication of a Winning Recidivism Prediction Model. International Journal of Offender Therapy and Comparative Criminology, Online First.
Lee, Y.J., O, S.H., & Eck, J.E. (2023). Improving recidivism forecasting with a relaxed naive Bayes classifier. Crime & Delinquency, Online First.
Mohler G., Porter M.D. (2021). A note on the multiplicative fairness score in the NIJ recidivism forecasting challenge. Crime Science, 10, 17.

Too relaxed? Naive Bayes does not improve recidivism forecasting in the NIJ chal...

Too relaxed? Naive Bayes does not improve recidivism forecasting in the NIJ challenge

References

Recommend

突发！又一金融大佬严重违纪违法，公司掌管3000亿元业内第一

杨惠妍卸任增城碧桂园物业董事长

艺术与科技创新的完美融合！三星The Frame画壁电视图赏

What Does if name == "main" Mean in Python?

Age-Period-Cohort graphs for suicide and drug overdoses

SK海力士展示全球首款321层NAND闪存样品，计划2025H1量产

Apple may be addressing my biggest complaint with the iPhone design

Some notes on synthetic control and Hogan/Kaplan

微信技术分享：揭秘微信后台安全特征数据仓库的架构设计

B2B2C电商平台建设价格是多少？

About Joyk