Averting Algorithm Aversion through Explainability

Imagine you’re applying to grad school and the admissions committee at your dream university decides that admission decisions this year will be made by Machine Learning (ML) algorithms instead of human reviewers. Would you be comfortable with ML algorithms evaluating you and making a decision? Some of us probably wouldn’t want that. But why?

Research shows that evidence-based algorithms (ML algorithms) more accurately predict the future than human forecasters. Algorithmic forecasts are seen to be superior to human forecasts in an array of applications, be it stock market forecasting or gameplay forecasting (in AlphaGo). Admission decisions can also be seen as a forecasting task as they are nothing but a prediction of how good a fit the candidate is to a particular program or a forecast of how successful the candidate will be. Yet, why do some of us want a human to evaluate us?

If algorithms are better forecasters than humans, then people should choose algorithmic forecasts over human forecasts. However, they often don’t. This phenomenon, which we call algorithm aversion , is costly, and it is important to understand its causes (Dietvorst, Simmons, and Massey, 2014).

We know very little about when and why people exhibit algorithm aversion. There is no agreed-upon mechanism on when people use human forecasters instead of superior algorithms, or why people fail to use algorithms for forecasting. As the amount of data we produce daily has now reached a point where almost all forecasting tasks need some kind of an algorithm involved, it is important to tackle the problem of algorithm aversion so that most of us can rely on better-performing algorithms to forecast the future for us.

Dietvorst, Simmons, and Massey ( 2014 and 2016 ) carried out several studies to find the causes of algorithm aversion. They found that:

People tend to more quickly lose confidence in algorithmic forecasters than human forecasters after seeing them make the same mistake.
People will use imperfect algorithms if they can (even slightly) modify the results. Hence, giving control can be a way of overcoming algorithm aversion.

However, we know that providing control may not be possible in many cases. Thereby, we need to look at other options for overcoming or averting algorithm aversion.

What is a black box algorithm? ( Source )

Modern forecasting algorithms are mostly seen as black boxes by a majority of the population as it involves a complex machine learning model, which very few understand. Add to this the fact that algorithm complexity and performance are seen to be inversely proportional to its explainability. For example, a linear regression model might be easy to interpret but might have poor performance. On the other hand, a neural network could have great performance but could be difficult to interpret at the same time. So, can explaining the model’s predictions or understanding what the model has learned help overcome algorithm aversion? Let’s find out!

I conducted an online experiment to combine these two areas of model explainability and algorithm aversion to answer the broader question of the possible mechanism of algorithm aversion. In particular, I wanted to explore the question: What role does model explainability play in algorithm aversion and can explanations help in overcoming aversion towards algorithms? I operationalized the question by observing if people choose the same algorithm more frequently (or rate it higher) over human forecasters (themselves) if it comes along with explanations.

Dataset

Before beginning my experiment, I needed to choose a machine learning algorithm that would act as my forecaster/predictor. For training any machine learning algorithm, we require data, and in our case labeled data. For this purpose, I used an open dataset from Kaggle which is similar to the admissions scenario discussed above.

To make sure that the participants aren’t overwhelmed by numbers, I gave special importance to keeping the number of features/predictors to under ten during the process of dataset selection. The Graduate Admissions dataset has a ‘chance of admit’ measure ranging from 0 to 1 for each student profile along with the following 7 parameters:

GRE Score ( out of 340 )
TOEFL Score ( out of 120 )
University Rating ( out of 5 ) from where the applicant completed undergrad. 1 being the least and 5 being the highest rating.
Statement of Purpose Strength ( out of 5 ): 1 being the least and 5 being the highest.
Letter of Recommendation Strength ( out of 5 ): 1 being the least and 5 being the highest.
Undergraduate GPA ( out of 10 )
Research Experience ( either 0 or 1 ) : 0 indicating no previous research experience and 1 indicating at least some research experience.

I converted the ’chance of admit’ measure into ’ Admission Score ’ by multiplying it by 100 so that it is easier for participants to play around with, i.e., they can enter whole numbers as predictions. ‘Admission Score’ can be thought of as the prediction of the success of the student or profile strength. The score ranges from 0–100 and a higher score indicates higher chances of admit/profile strength. The dataset had 500 entries in total. The dataset was clean and didn’t require any other major data preprocessing or data wrangling steps.

Model and Explainer

I trained several models on the dataset and XGBoost was one of the best performing models. I decided to stick with XGBoost as the graduate admission predictor as I got good enough results even with minimum parameter tuning and preprocessing. After having the machine learning model ready, I had to choose a library to generate explanations for the algorithm’s predictions. Thankfully, the machine learning community has been receptive to the problem of model explainability and has developed several new methodologies to explain machine learning models.

One such explainer is SHAP (SHapley Additive exPlanations), a game-theoretic approach to explaining the output of any machine learning model. SHAP can produce explanations for individual rows showing features each contributing to pushing the model output from the base value. Summary plots and contribution dependence plots indicating overall model explanations can also be produced using the SHAP library. The following resources have been very helpful in understanding model explainability using SHAP and I recommend you check them out:

Experiment Design

The experiment was a simple randomized controlled experiment to check if adding explanations had any effect on the choices people made or how they perceived/rated the algorithm. It was built on the Qualtrics survey platform and participants were recruited on Amazon MTurk and outside of crowdsourcing platforms through friends and connections. The total number of participants was 445, with 350 participants from MTurk and the rest 95 from various sources. The participants had an average age of 30.76 years with around 43% of them being female and 57% male.

The survey (Please take a look to get a better idea of the flow) started by asking the participants about their age, gender, and familiarity with probability, computer science, and machine learning on a scale of 1–4. This was for exploring the heterogeneity of treatment effects once the experiment is completed. Then, all participants were familiarized with the imaginary scenario of a Canadian University utilizing a machine learning algorithm in their admissions committee. Following that, participants were presented with the value of features for 10 applicants and asked to predict the Admission Score for them. After the participant scored each applicant, the participant’s prediction was displayed alongside the algorithm’s prediction if the participant belonged to the control group. For the treatment group, however, in addition to both the participant and the algorithm’s prediction, an explanation of what features were affecting the model was displayed.

Dataset

Model and Explainer

Experiment Design

Recommend

2020年，我们该如何学习 WEB 前端开发？

threejs游戏开发经验总结

Mysql和Redis数据同步策略

bashtop – the ‘cool’ top alternative.

Ethereum Dapps Now Hold Six Times More Bitcoin Than The Lightning Network

Ethereum Transactions Surpass One Million While Nodes Jump in China

Cross building Rust GStreamer plugins for the Raspberry Pi

Zero To Production #2: Learn By Building An Email Newsletter

Graph & Tree Traversals in Rust

Design updates to repositories and GitHub UI

About Joyk