34

Harnessing The Power Of Uncertainty

 4 years ago
source link: https://www.tuicool.com/articles/jeAnEvz
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Harnessing The Power Of Uncertainty

Gradient Boosting Trees, Logistic Regression, and Deep Learning — you can do it in each ML model you want and achieve a lot more from your machine learning model.

Oct 30 ·5min read

vyINbeb.png!web

p hoto by Santiago Lacarta on Unsplash

Let start by imagining, for example, given several pictures of dog breeds as training data, we have a trained model — when a user uploads a photo of his dog — the hypothetical website should return a prediction with rather high confidence.

But what should happen if a user uploads a photo of a cat and asks the website to decide on a dog breed?

The model has been trained on photos of dogs of different breeds, and has (hopefully) learned to distinguish between them well. But the model has never seen a cat before, and a photo of a cat would lie outside of the data distribution the model was trained on. This illustrative example can

be extended to more serious settings, such as MRI scans with structures a diagnostics system has never observed before, or scenes an autonomous car steering system has never been trained on.

A possible desired behavior of a model in such cases would be to

return a prediction (attempting to extrapolate far away from our observed data), but return an answer with the added information that the point lies outside of the data distribution (see a simple depiction for the case of regression in figure 1.2). I.e. we want our model to possess some quantity conveying a high level of uncertainty with such inputs (alternatively, conveying low confidence).

In this post, I will present methods of how to mine your model uncertainty.

I will show it on 3 different kinds of models: L ogistic Regression , Gradient Boosting Tree s, and D eep Learning . This post is to people who already familiar with those algorithms and architectures.

Ready? Let’s start!

Harnessing the power of Uncertainty

To harness the power of your model uncertainty, you will need to know the distribution of your model (most of the time it is easy to assume it’s close to a normal distribution).

When the distribution is known, you can calculate your model mean and standard deviation.

the standard deviation is the uncertainty of the models and the mean is the result from your model. you can use it now for simple UCB with:

NJnuuqI.png!web
3 is just a factor for the sigma, it can be any number, the larger the factor the more weights on the exploration

This is one example, you can also use it in Tompson Sampling, and assuming your model is normally distributed.

Gradient Boosting Trees

Gradient Boosting Trees are a powerful algorithm, maybe the most common algorithm used today.

I think the most common algorithms are: XGBoost, LightGBM, and CatBoost.

In this post, I will show the example with CatBoost, but the same idea can be implemented on all of them, also on regular decision trees like Random Forest.

niQzAv7.png!web

To get the uncertainty of your Decision Tree model, you need to collect the prediction of each tree separately. If you are using classification get the probability for the classification of each tree. If you are using regression use the value of the regression.

When you have for each prediction a list of probabilities from each tree, you can calculate the mean and standard deviation from the list of probabilities for each prediction.

example with CatBoost how to get each tree prediction, and from those predictions calculate the mean and the std of each prediction in order to get and use the uncertainty

Deep Learning

MjI77z3.png!web

the reason I started looking at uncertainty in models to improve accuracy, was because of deep learning. I think the person who put the “un” in uncertainty for deep learning was Yarin Gal and his thesis the “ Uncertainty in Deep Learning ”. his main idea is to use the dropout, who is more commonly used in training to avoid overfitting, also in prediction. On prediction, we will use dropout also, and run the same prediction X number of time, because of the dropout is randomly chosen each prediction we will get some different results for the same prediction. with those different results, we can easily calculate the mean and std.

When you have the mean and the standard deviation, it’s the same solution as showed before.

Logistic Regression

vIBV3my.png!web

Sigmoid Function Graph
UBJvmmm.png!web
The formula of a sigmoid function | Image: Analytics India Magazine

I will start by saying that here we are not getting the uncertainty of the model but we are understanding the uncertainty of our training data.

The idea here is to train the same model on different cuts from the training sets. here is what you need to do to get the uncertainty of your model:

  1. Split your training set to N cuts
  2. Each cut will have 70%–80% of your training sets records randomly selected.
  3. On each cut, you will train a logistic model (at the end you will have N models)
  4. Run each prediction on all N trained models
  5. Calculate the mean and the std of all the predictions.

Wrapping up

In this post, we went over ways to get your Model Uncertainty.

The most common way is in deep learning, but you saw also ways to do it in GDT and Logistic Regression models.

As I said, the uses of Model Uncertainty can be in opening the model to explore more, but it also can improve accuracy. If your data can get new items with less knowledge about those items, Model Uncertainty can be very helpful here.

contact me on Linkedin if you have any questions or need a better understanding of how it can help you, and how to do it technically.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK