What’s Linear About Logistic Regression
source link: https://www.tuicool.com/articles/UJzaQvU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
There’s already a bunch of amazing articles and videos on Logistic Regression, but it was a struggle for me to understand the connection between the probabilities and the linearity of Logistic, so I figured I would document it here for myself and for those who might be going through the same thing.
This will also shed some light on where the ‘Logistic’ part of Logistic Regression comes from!
The focus of this blog will be on building an intuitive understanding of the relationship between the logistic model and the linear model, so I’m just going to do an overview of what Logistic Regression is and dive into that relationship. For a more complete explanation of this awesome algorithm, here are some of my favorite resources:
- https://www.youtube.com/watch?v=-la3q9d7AKQ
- https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
- https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
- https://christophm.github.io/interpretable-ml-book/logistic.html
Now let’s get to the gist of Logistic Regression.
What is Logistic Regression?
Like Linear Regression, Logistic Regression is used to model the relationship between a set of independent variables and a dependent variable.
Unlike Linear Regression, the dependent variable is categorical, which is why it’s considered a classification algorithm.
Logistic Regression could be used to predict whether:
- An email is spam or not spam
- A tumor is malignant or not
- A student will pass or fail an exam
- I will regret snacking on cookies at 12 am
The applications listed above are examples of Binomial/Binary Logistic Regression where the target is dichotomous (2 possible values), but you could have more than 2 classes (Multinomial Logistic Regression).
These classifications are made based on the probabilities produced by the model and some threshold (typically 0.5). E.g. A student is predicted to pass if her probability of passing is greater than 0.5.
Let’s start digging into how these probabilities are calculated.
The Sigmoid Function
If we visualize a dataset with binary target variables, we’d get something like this:
There are a couple of reasons why fitting a line might not be a good idea here:
- In Linear Regression, the dependent variable could range from negative inf to positive inf, but we’re trying to predict probabilities which should be between 0 and 1.
- Even if we created some rules to map those out-of-bound values to a label, the classifier would be very sensitive to outliers which would have an adverse effect on its performance.
So, instead of a straight line, we model it with an S shape that flattens out near 0 and 1:
This is called a sigmoid function and it has this form:
This function returns the probability that an observation belongs to a class based on some combination of factors.
And if we solve for the linear function, we’d get the log of the odds or the logit:
Notice how when p(x) ≥0.5, βX ≥ 0.
But wait a minute, where did this magical function come from and how did the linear model get in there? To answer that, we’ll take a look at how Logistic Regression forms its decision boundary.
Decision Boundary
Behind every great Logistic Regression model is an unobservable (latent) linear regression model, because the question it’s really trying to answer is:
“What is the probability an observation belongs to class 1 given some characteristics x?”
Let’s take a look at an example.
Supposed we want to predict whether a student will pass an exam based on how much time she spent studying and sleeping:
Let’s understand our data better by plotting Studied against Slept and color code our classes to visualize the split:
import pandas as pd import matplotlib import matplotlib.pyplot as plt
exams = pd.read_csv('data_classification.csv', names=['Studied','Slept','Passed'])
fig = plt.figure() ax = fig.add_subplot(111)
colors = [‘red’, ’blue’]
ax.scatter(exams.Studied, exams.Slept, s=25, marker=”o”, c=exams[‘Passed’], cmap=matplotlib.colors.ListedColormap(colors))
Looking at this plot, we can hypothesize a few relationships:
- Students who spend enough time studying and get lots of sleep are likely to pass
- Students who sleep less than 2 hours but spend 8+ hours studying will probably still pass (I was for sure in this group)
- Students who slack on studying and forego sleep have probably accepted their fate of not passing
The idea here is there’s a clear line separating these two classes, and we’re hoping Logistic Regression is going to find that for us. Let’s fit a Logistic Regression model and overlay this plot with the model’s decision boundary.
from sklearn.linear_model import LogisticRegression
features = exams.drop(['Passed'],axis=1) target = exams['Passed']
logmodel = LogisticRegression() logmodel.fit(features, target) predictions = logmodel.predict(features)
You can print out the parameter estimates:
Using those estimates, we can calculate the boundary. Since our threshold is set at 0.5, I’m holding the logit at 0. This also allows us to view the boundary in 2d:
exams['boundary'] = (-logmodel.intercept_[0] - (logmodel.coef_[0][0] * features['Studied'])) / logmodel.coef_[0][1]
Here’s what it looks like on our scatter plot:
plt.scatter(exams['Studied'],exams['Slept'], s=25, marker="o", c=exams['Passed'], cmap=matplotlib.colors.ListedColormap(colors))
plt.plot(exams['Studied'], exams['boundary'])
plt.show()
That looks reasonable! So how does Logistic Regression use this line to assign class labels? It looks at the distance between each individual observation and the linear model. It would label all points above this line as 1 and everything below as 0. Any points on this line could belong to either class (0.5 probability), so in order to classify a point as 1, we’re interested in the probability that the distance between this line and our observation is greater than 0.
As it turns out, in Logistic Regression, this distance is assumed to follow the logistic distribution.
In other words, the error term of the latent linear regression model in Logistic Regression is assumed to follow the logistic distribution.
This means when we ask:
We’re really asking:
To calculate this probability, we take the integral of the logistic distribution to get its cumulative distribution function:
Oh hey! It’s the sigmoid function :).
Tada! You should now be able to walk back and forth between the sigmoid function and the linear regression function more intuitively. I hope understanding this connection built a higher appreciation for Logistic Regression for you as it did mine.
Recommend
-
76
在本篇文章中,我們將以 Ranking 階段常用的方法之一:Logistic Regression 邏輯迴歸為例,利用 Apache Spark 的 Logistic Regression 模型建立一個 GitHub repositories 的推薦系統,以用戶對 repo 的打星紀錄和用戶與 repo 的各項屬性做為特徵,預測出用戶會不會...
-
39
Efficient Logistic Regression on Large Encrypted Data Kyoohyung Han and Seungwan Hong and Jung Hee Cheon and Daejun Park Abstract: Machine learning on encrypted data is a cryptographic method fo...
-
62
-
55
-
56
-
24
In this post we'll take a helpful look at the relationship between Bayes' Theroem and logistic regression. Despite being a very commonly used tool in statistics, machine learning and data science, I've found people frequen...
-
34
How likely am I to subscribe a term deposit? Posterior probability, credible interval, odds ratio, WAIC
-
14
本文简单介绍一下Logistic Regression的定义和原理。对于(Linear Regression)线性回归模型,输入$x$,网络参数为$w$和$b$,输出值为$y$,是个连续值。但是分类问题最终的输出值应该为离散的,那么如何转化为分类问题呢? 可以考虑...
-
4
Tuesday, February 2, 20216:00 PM to 8:00 PM MSTOnline eventLink visible for attendeesDetai...
-
3
Linear regression and logistic regression are two of the simplest machine learning algorithms. This article discusses linear regression vs logistic regressi...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK