25

Logistic Regression from Bayes’ Theorem

 4 years ago
source link: https://www.tuicool.com/articles/uyuYVnu
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this post we'll take a helpful look at the relationship between Bayes' Theroem and logistic regression. Despite being a very commonly used tool in statistics, machine learning and data science, I've found people frequently get confused about the details of how logistic regression actually works. By showing you how you can derive logistic regression from Bayes' theorem you should have a much easier time remembering exactly how this useful tool works. Ultimately we'll see that logistic regression is a way that we can learn the prior and likelihood in Bayes' theorem from our data. This will be the first in a series of posts that take a deeper look at logistic regression.

The key parts of this post are going to use some very familiar and relatively straightforward mathematical tools. We're going to use to Bayes’ theorem

$$P(H|D) = \frac{P(D|H)P(H)}{P(D)}$$

which you can refresh in this post on Bayes’ Theorem with Lego , and the basic linear model

$$y = \beta x + \beta_0$$

Which just says that some target variable \(y\) can be understood as a linear combination of \(x\) multiplied by coefficients \(\beta\) plus some constant \(\beta_0\).

Logistic Regression Basics

As a quick refresher, logistic regression is a common method of using data to predict the probability of some hypothesis. More mathematically speaking we have some input \(x\), this could single value like someone's height or it could be an vector like the pixels in the image, and some \(y\) which represents an out come such as "can slam dunk a basketball" or "is picture of a cat". Our goal in logistic regression is to learn the probability of \(y\) given \(x\), or \(p(y|x)\). The model is trained on examples were \(y\) is a binary outcome, 1 meaning success and 0 being failure, and \(x\) is an example corresponding of data that resulted in the outcome \(y\). When we train the model we have a vector of \(y\) and a matrix \(X\), the rows of which represent training examples and the columns features.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK