How can I trust you?

An intuition and tutorial on trust score

Sep 28 ·14min read

Several efforts to improve deep learning performance have been done through the years, but there are only few works done towards better understanding the models and their predictions, and whether they should be trusted or not.

In this article, we shall lightly probe the trustworthiness of a model in terms of its predictions. However, the term trust might seem vague and might reflect a wide range of its denotations and/or connotations. So, for the sake of our discussion, it may be safer that we limit the term trust to denote a “fail-safe” feature for a model’s predictions — that is, a secondary or supporting opinion of the model predictions.

If you are more interested on the practical stuff, you may skip to the Trust Score section.

1*ivzFhAkG3PcjzgvcvE5IzQ.png?q=20

Image from Chapter 1 slides of “Learn TensorFlow and deep learning, without a Ph.D.” by Martin Görner. Cartoon images copyright: alexpokusay / 123RF stock photos . We tend to heavily rely on deep learning models for several tasks, even for the simplest problems, but are we sure that we are given the right answers?

Since the re-emergence of deep neural networks in 2012 by famously winning the ImageNet Challenge ( Krizhevsky et al., 2012 ), we have employed deep learning models in a variety of real-world applications — to the point where we resort to deep learning to solve even the simplest problems. Such applications range from recommendation systems ( Cheng et al., 2016 ) to medical diagnosis ( Gulshan et al., 2016 ). However, despite the state-of-the-art performance of deep learning models in these specialized tasks, they are not infallible from committing mistakes, in which the degree of seriousness of such mistakes vary per application domain. So, the call for AI safety and trust is not surprising ( Lee & See, 2004 ; Varshney & Alemzadeh, 2017 ; Saria & Subbaswamy, 2019 ). For years, much of the efforts were about improving the performance of models, while further investigation on model limitations has not received an equal effort.

Despite receiving relatively less attention, there are some excellent works on better understanding model predictions, and these include but are not limited to the following: (a) the use of confidence calibration — where the outputs of a classifier are transformed to values that can be interpreted as probabilities ( Platt, 1999 ; Zadrozny & Elkan, 2002 ; Guo et al., 2017 ), (b) the use of ensemble networks to obtain confidence estimates ( Lakshminarayanan, Pritzel, & Blundell, 2017 ), and (c) using the softmax probabilities of a model to identify misclassifications ( Hendrycks & Gimpel, 2016 ).

Now, the aforementioned methods use the reported score of a model for confidence calibration — which may seem daunting even just to think about. Enter: Trust Score . Instead of merely extending the said methods, Jiang et al. (2018) developed an approach based on topological data analysis , where they provide a single score for a prediction of a model, called trust score .

Jibber-jabber aside, the trust score simply means the measurement of agreement between a trained classifier f(x) and a modified nearest-neighbor classifier g(x) on their prediction for test example x .

Trust Score

The agreement between a trained classifier f(x) and a modified nearest-neighbor classifier g(x) on their prediction for test example x is measured as the ratio of the distance of x to the nearest class different from the predicted class (let’s denote this as ĥ ) to the distance of x to the predicted class (let’s denote this as h ).

A score of 1 would mean that the predicted class h and the “closest not predicted class” ĥ are equidistant to the test example x . This would then imply that a predicted class with a higher score than 1 is trustworthy since the distance of ĥ to x is higher, i.e. trust score = ĥ / h .

The explanation above was meant to be the intuition behind the trust score. But we shall further inspect how the trust score is computed.

1*-fmYRjiY5Gg1dab4djhcIg.png?q=20

Algorithm 1 and Algorithm 2 from Jiang et al. (2018) . Algorithm 1 filters out probable outliers in a dataset, while Algorithm 2 computes the trust score from the filtered dataset and the model predictions.

To compute the trust score, we first need to have an α-high-density-set, which can be obtained through Algorithm 1 where it filters out the α-fraction of the training data points with lowest empirical density, a.k.a. data points, that do not seem to cluster together so much . Put it simply, the α-high-density-set consists of training data points where the probable α-coefficient outliers are eliminated — thus the purpose of the α parameter. Jiang et al. (2018) defined this procedure as their modification to the nearest-neighbor classifier.

The resulting α-high-density-set would serve as the dataset for our nearest-neighbor classifier. However, since a (k)NN classifier has a search time of O(n) , a k-d tree is used to speed up the search, thus reducing the search time complexity to O(log(n)) . A k-d tree for each class in the set is then constructed.

To do this in code, let’s borrow Seldon’s extended implementation of the open source code from Jiang et al. (2018) . We can define the instances to keep by using np.percentile(knn_radius, (1 — alpha) * 100.0) , and then we can repeat this process for other classes.

This implementation is from Seldon’s extended version of the open source code from Jiang et al. (2018) . The α-high-density-set may be obtained by filtering out the α-fraction of training data points that may be considered as outliers.

From this, we should expect a training dataset with the data points from the same class to be more clustered together — a.k.a the α-high-density-set.

After obtaining the α-high-density-set, we can now compute the trust score of a trained model prediction — detailed in Algorithm 2.

When a trained classifier makes predictions on a test example x , the distance between x and each of the trees is measured. The trust score is then calculated by taking the ratio of (a) the smallest distance (hence the argmin function) between ĥ and x to (b) the distance between h and x .

This implementation is from Seldon’s extended version of the open source code from Jiang et al. (2018) . The trust score is the ratio of (a) distance between test example x to closest class other than the predicted class ĥ to (b) distance between example x to predicted class h.

We can measure the distances either by using distance to the k -th nearest neighbor in each tree dist_type = "point" or by using the average distance from the first to the k -th nearest neighbor dist_type = "mean" — we shall measure the distances by using dist_type = "point" in this article. Then, the distance metric we shall use in this article is Euclidean distance (although any distance metric may be used).

However, for visualization purposes, the score function used in this article was modified to retrieve not only the trust score and the closest not predicted class, but also the indices of the predicted class and of the nearest class other than the predicted class . The modified implementation is available here .

Despite the relatively long explanation on how to compute the trust score for a model prediction, all we need to do so is write three lines (excluding the import line),

First, we define a TrustScore object with the α parameter for filtering out data points with low empirical density. Then, we filter the α-high-density-set from the training dataset. Finally, we use the fitted TrustScore object to compute for the trust score for model predictions.

Since computing the trust score relies on kNN (and kNN suffers from curse of dimensionality), instead of using the training features as they are, we can encode them to a lower dimension — in which Jiang et al. (2018) found out that trust score works best for low- and medium-dimension feature spaces.

The encoded_train_features in the code snippet above was encoded using principal component analysis (PCA) with 64 principal components.

Then after fitting a TrustScore object, we can use it to compute the trust scores on model predictions. The code snippet above lays down the canonical code for computing the trust score, and getting ĥ — which will also use the PCA-encoded test features.

Summary on computing trust score : First, we get the α-high-density-set (where probable outliers are filtered out) by using Algorithm 1. Second, we compute the ratio of the distance between ĥ and x (let’s denote this as d(ĥ, x) ) to distance between h and x (let’s denote this as d(h, x) ) using Algorithm 2.

Now that we know how to compute the trust score for a model prediction, let’s take it out for practice, and use it to compare three deep learning models.

Models

For this article, we go through three deep neural networks: a feed-forward neural network, a LeNet convolutional neural network ( LeCun et al., 1998 ), and a miniature VGG convolutional neural network.

We shall implement our deep learning models using TensorFlow 2.0, but first, the TF version used in the prepared experiments was 2.0.0-beta1 . It is recommended to install it inside a virtual environment,

pip install tensorflow==2.0.0-beta1

or if you have a GPU in your system,

pip install tensorflow-gpu==2.0.0-beta1

There are more details on installation in this guide from tensorflow.org .

Feed-Forward Neural Network

We shall implement a feed-forward neural network (FFNN) with two hidden layers, each with 512 neurons and ReLU activation, with a dropout rate of 0.2, and followed by a softmax classification layer. An abbreviated version of this model is illustrated in Figure 1.

1*aMycR8vXJ-xXxnzS-WMXsA.png?q=20

Figure 1. Illustrated using NN-SVG . A feed-forward neural network with two hidden layers. It learns to approximate the target label y by learning the appropriate θ parameters with the criteria of minimizing the difference between its output label f(x; θ) and target label y .

Our model implementation using the TensorFlow 2.0 Subclassing API is as follows,

A 2-layer feed-forward neural network model written in TensorFlow 2.0 subclassing API.

LeNet Convolutional Neural Network

For our second model, we shall implement a LeNet-5 convolutional neural network (CNN). To implement this model, we will have two convolutional layers, each followed by their own activation and max pooling layers. The first convolutional layer has 6 filters and a kernel size of 5x5 while the second convolutional layer has 16 filters and a kernel size of 5x5 as well — each convolutional layer is followed by a ReLU activation instead of a hyperbolic tangent. Then, they are followed by two fully connected layers with 120 neurons and 84 neurons each, both of which also use ReLU activation, and finally, followed by a softmax classification layer. The model is illustrated in Figure 2.

1*Q3WM1UJiQkz5ZgdbSj08wg.png?q=20

Figure 2. Illustrated using NN-SVG . The LeNet architecture consists of two convolutional layers, each with their own pooling layers, followed by fully-connected layers and a classification layer.

Our model implementation using the TensorFlow 2.0 Subclassing API is as follows,

A LeNet-5 CNN model written in TensorFlow 2.0 subclassing API.

Minitiature VGG Convolutional Neural Network

Finally, our third model is an abbreviated version of the VGG model ( Simonyan & Zisserman, 2014 ). The original VGG model had 19 layers (16 convolutional, and 3 fully connected), but for our purpose of just comparing deep learning models using trust score and for our dataset, we shall use a smaller version that has 4 convolutional layers followed by 2 fully connected layers. The first two convolutional filters will have 32 filters while the second two will have 64 filters, each two were followed by a max pooling layer, with a dropout rate of 0.25, and a fully connected layer with 256 neurons with ReLU activation with a dropout rate of 0.5, and finally a softmax classification layer. This modified architecture is illustrated in Figure 3.

1*mIFIbCiCEFePSL26FjxPSw.png?q=20

Figure 3. Illustrated using NN-SVG . A miniature VGG architecture that consists of four convolutional layers, with a pooling layer after each two convolutional layers, then followed by a fully-connected layer and a classification layer.

Our model implementation using the TensorFlow 2.0 Subclassing API is as follows,

A mini VGG CNN model written in TensorFlow 2.0 subclassing API.

Now that we have defined the classes for our models, we can instantiate them, and train them on a dataset. Since this is just a test for the trust score, we shall use the never-ending MNIST dataset ( LeCun et al., 1998 ) — but of course, you can choose any dataset you want, say, Fashion-MNIST , EMNIST (bunch of MNISTs, yes), or even CIFAR-10 .

To use the MNIST dataset, let’s load it, and create a tf.data.Dataset object for it as follows,

Loading the MNIST dataset and creating a tf.data.Dataset object for it in TensorFlow 2.0.

To train our models written in TensorFlow 2.0 Subclassing API, we have two methods to choose from.

First, compiling them as if they are a tf.keras.Sequential model, i.e. using model.compile() , and then train them using the model.fit() function,

Compiling a model written using TensorFlow 2.0 Subclassing API, and training it using model.fit() function.

Second, we can define a custom training loop for it,

Custom training loop for a model written in TensorFlow 2.0 Subclassing API.

The first approach is easier, more convenient, and faster to implement while the second approach provides more control over the training loop for a model.

You will see in the notebook experiments for this article that the approach used for training the models was the first one, i.e. using model.compile() and model.train() . This is because I was too lazy to write a custom training loop until the time of writing of this article — I’m just kidding. It is because for this particular experiment, I thought there would be no need for manipulating the gradient computations (like we do in using gradient noise addition , yes, a shameless plug indeed) or any other tricks during training.

Now that we have defined our trust score object , model classes , and training function, we can now see the results after training the models.

Results

Each model was trained for 60 epochs with a mini-batch size of 512 on the MNIST dataset ( LeCun et al., 1998 ), resulting to 7,020 training steps. Both the 2-layer FFNN and LeNet-5 CNN were trained using SGD with Momentum (learning rate = 0.1, momentum = 0.9), with a learning rate decay of 1e-6. As for the mini-VGG CNN, it was trained using Adam ( Kingma et al., 2014 ) with a learning rate of 0.01.

The test accuracy for each model together with their number of parameters are written in Table 1.

1*kfvBXuPMUO-k3MFDYBffRg.png?q=20

Table 1. Test accuracy of the deep learning models on the MNIST handwritten digits classification dataset.

The number of parameters were obtained by using the model.summary() function of the tf.keras API. We can see that despite having the highest number of parameters, the 2-layer FFNN fell 0.42% short on the test accuracy against the mini-VGG CNN. In addition, even though LeNet had only 44,426 parameters, it is on par with the 2-layer FFNN and mini-VGG CNN in terms of test accuracy.

However, we are not here to merely check the test accuracy, we are here to see if these trained models are trustworthy!

To compute the trust score for our deep learning models, we used their model predictions as the input — but in the original paper by Jiang et al. (2018) , they also used the different learned representations of features instead of the model predictions alone. In Figure 4, we have the Trust Score and Model Confidence curves for correctly classified examples (Figure 4 (a-c), Detect Trustworthy ) and misclassified examples (Figure 4 (d-f), Detect Suspicious ). These plots depict the performance (i.e. the y -axis) of the trained classifiers at a given percentile level (i.e. the x -axis).

1*UMZeKRkYdWPlP1sKZOLzlg.png?q=20

Figure 4. Trust score results using 2-layer FFNN, LeNet-5, and Mini-VGG on MNIST dataset. Top row is detecting trustworthy; bottom row is detecting suspicious.

The vertical black lines in Figure 4 (a-c) denote the error level of the trained classifier while the vertical black lines in Figure 4 (d-f) denote the accuracy level of the trained classifiers. From both performance metrics, we can see that the trained classifier with both high test accuracy and trust score is the 2-layer FFNN — which is somehow not surprising in a way that it had the most number of parameters, and surprising in a way that despite its simpler architecture compared to our CNN-based models, it outperformed both in this test. To be fair, our CNN-based models were trained from scratch whereas in the original paper, they used a pre-trained VGG classifier and smaller pre-trained CNN classifiers.

To further inspect and appreciate trust score, let’s take a look at using it for a single prediction. This will further enhance our intuition and understanding of trust score.

In Figure 5, we can see the position of the test example x in the 3D feature space, together with the predicted class h and the closest not predicted class ĥ (left side of the figure). We can also see the numerical distance between x and h , and the numerical distance between x and ĥ . In addition, we can also see the test image x , the predicted class h (along with the likelihood and trust score), and the closest not predicted class ĥ at the right side of the figure.

1*pdqhDSPnAVrFzBdbytAwSg.png?q=20

Figure 5. Left side: the data points x (test example), ĥ (closest not predicted class), and h (predicted class) in a 3D feature space. Right side, top-to-bottom: image representation of data point x , h , and ĥ .

From Figure 5, we can confirm visually and numerically the distances among the points x , ĥ , and h . With the distance d(ĥ , x) being higher (i.e. 4.22289) than the distance d(h , x) (i.e. 2.73336), we can confirm the trust score given at the right side of the figure, 1.54494 . Can the model prediction be trusted? Visually? Yes. We can see the plotted points where the x and h are much closer together than x and ĥ are, and the plotted images at the right in Figure 5 support the class prediction. Numerically? Yes. We can see the numerical distance among points, and compute the ratio between these numerical distances.

Closing Remarks

In this article, we discussed the trust score , a simple yet effective way of judging whether a prediction from a trained classifier can be trusted or not. The trust score takes advantage of the relative positions of data points (that may be lost in common approaches like model confidence ) to provide an interpretable information about the prediction from a trained classifier.

In this article, we used the PCA-encoded features for computing the trust scores for the predictions of the trained classifiers. But how about if we used encoded features from an autoencoder? Can we also compute trust scores for text classification?

I hope we have covered the trust score metric in both an intuitive and sufficient manner, enough to make you wonder and explore more on the question of should we blindly trust the results of our deep learning models?

If you want to use trust score in your projects, you can find the full code here . In case you have any feedback, you may reach me through Twitter . We can also connect through LinkedIn !

How can I trust you?