41

The five discrete distributions every Statistician should know

 4 years ago
source link: https://www.tuicool.com/articles/qaQfmqb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

nquInym.jpg!web

Photo by Alex Chambers on Unsplash

The five discrete distributions every Statistician should know

The story, Proofs, and Intuition

EZziMre.png!web

Aug 14 ·8min read

Distributions play an essential role in the life of every Statistician.

Now coming from a non-statistical background, distributions always come across as something mystical to me.

And the fact is that there are a lot of them.

So which ones should I know? And how do I know and understand them?

This post is about some of the most used discrete distributions that you need to know along with some intuition and proofs.

1. Bernoulli Distribution

jIVJNry.jpg!web

This one is perhaps the most simple discrete distribution of all and maybe the most useful as well.

Story:A Coin is tossed with probability p of heads.

Where to Use?: We can think of binary classification target as a Bernoulli RV.

PMF of Bernoulli Distribution is given by:

6RBR7fa.png!web

CDF of Bernoulli Distribution is given by:

YBR7vaY.png!web

Expected Value:

q6ZN32r.png!web

Variance:

ZvauYjM.png!web

Bernoulli Distribution is closely associated with a lot of distributions as we will see below.

2. Binomial Distribution:

jmu6BbA.jpg!web

One of the most basic distributions in the Statistician toolkit. The parameters of this distribution are n(number of trials) and p(probability of success).

Story:Probability of getting exactly k successes in n trials

Where to Use?: Suppose we have n eggs in a casket. The probability of having a broken egg is p. The number of broken eggs in the casket is then Binomially distributed.

PMF of binomial Distribution is given by:

As per our story, This is the Probability that k bulbs are broken.

CDF of binomial Distribution is given by:

BbYzyuE.png!web

Expected Value:

First Solution:

iyqi6nA.png!web

A better way to solve this:

X is the sum of n indicator Random Variables where each I is a Bernoulli Random Variable.

Ajuuuij.png!web

Variance:

We can use the Indicator Random variable using Variance too as each Indicator random variable is independent.

z6fUZr6.png!web

3. Geometric Distribution:

am6rauE.jpg!web

The parameter of this distribution is p(probability of success).

Story:The number of failures before the first success(Heads) when a coin with probability p is tossed.

Where to Use: Suppose you are giving an exam, and the probability of your getting pass is given by p. The number of attempts you will need to clear the exam is distributed Geometrically.

PMF of Geometric Distribution is given by:

2aYfUz3.png!web

CDF of Geometric Distribution is given by:

NfUJNvj.png!web

Expected Value:

VzIz6vb.png!web

Variance:

nI3UfyV.png!web

Thus,

JZfyQne.png!web

Example:

A doctor is seeking an anti-depressant for a newly diagnosed patient. Suppose that, of the available anti-depressant drugs, the probability that any particular drug will be effective for a particular patient is p=0.6. What is the probability that the first drug found to be effective for this patient is the first drug tried, the second drug tried, and so on? What is the expected number of drugs that will be tried to find one that is effective?

Expected number of drugs that will be tried to find one that is effective =

q/p = .4/.6 =.67

4. Negative Binomial Distribution:

rQ3YB3r.jpg!web

The parameters of this distribution are p(probability of success) and r(number of success).

Story:The number of failures of independent Bernoulli(p) trials before the rth success.

Where to Use: You need to sell r candy bars to different houses. The probability that you will sell a candy bar is given by p. The number of failures you will have to endure before getting r successes is distributed as Negative Binomial.

PMF of Negative Binomial Distribution is given by:

r successes, k failures, last attempt needs to be a success:

Expected Value:

The negative binomial RV could be stated as the sum of r Geometric RVs since Geometric Distribution is just the number of failures before the first success.

Thus,

rEFvm2R.png!web

Variance:

Since the r geometric RVs are independent.

Example:

Pat is required to sell candy bars to raise money for the 6th-grade field trip. There are thirty houses in the neighborhood, and Pat is not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.4 probability of selling one candy bar and a 0.6 probability of selling nothing. What’s the probability of selling the last candy bar at the nth house?

Here, r = 5 ; k = n — r

Probability of selling the last candy bar at the nth house =

NZJNvq7.png!web

5. Poisson Distribution:

zMzuYvv.png!web

The parameter of this distribution is λ, the rate parameter.

Motivation:There is as such no story to this distribution but motivation for using this distribution. The Poisson distribution is often used for applications where we count the successes of a large number of trials where the per-trial success rate is low.

For example, the Poisson distribution is a good starting point for counting the number of people who will email you over an hour. You have a large number of people in your address book, and the probability that any of them will send you a mail is pretty small.

PMF of Poisson Distribution is given by:

2UrI7bq.png!web

Expected Value:

ieIFRzA.png!web

Variance:

i6nymqr.png!web

Example:

If electricity power failures occur according to a Poisson distribution with an average of 3 failures every twenty weeks, calculate the probability that there will not be more than one failure during a particular week?

Probability = P(X=0)+P(X=1) =

Seaborn Graphs and Functions

And here I will generate the PMFs of the discrete distributions we just discussed above using Pythons built-in functions. For more details on the upper function, please see my previous post — Create basic graph visualizations with SeaBorn . Also, take a look at the documentation guide for the below functions

# Binomial :
from scipy.stats import binom
n=30
p=0.5
k = range(0,n)
pmf = binom.pmf(k, n, p)
chart_creator(k,pmf,"Binomial PMF")

fYvYJrn.png!web

# Geometric :
from scipy.stats import geom
n=30
p=0.5
k = range(0,n)
# -1 here is the location parameter for generating the PMF we want.
pmf = geom.pmf(k, p,-1)
chart_creator(k,pmf,"Geometric PMF")

6vmmy2m.png!web

# Negative Binomial :
from scipy.stats import nbinom
r=5 # number of successes
p=0.5 # probability of Success
k = range(0,25) # number of failures
# -1 here is the location parameter for generating the PMF we want.
pmf = nbinom.pmf(k, r, p)
chart_creator(k,pmf,"Nbinom PMF")

NVVrAju.png!web

#Poisson
from scipy.stats import poisson
lamb = .3 # Rate
k = range(0,5)
pmf = poisson.pmf(k, lamb)
chart_creator(k,pmf,"Poisson PMF")

EjmIfaj.png!web

You can also try to visualize distributions with different parameters than I have used.

Conclusion:

Q7buA3m.jpg!web

Understanding distributions is vital for any statistician.

They occur very frequently in life, and understanding them makes life easier for you as you can get to a solution pretty fast just by using a simple equation.

In this article, I talked about some of the essential discrete distributions along with a story to support them.

The formatting for this post might look a little annoying, but Medium doesn’t support latex so can’t do much here.

I still hope this helps you to get a better understanding.

One of the most helpful ways to learn more about them is the Stat110 course by Joe Blitzstein and his book .

You can check out this Coursera course too.

Thanks for the read. I am going to be writing more beginner-friendly posts in the future too. Follow me up at Medium or Subscribe to my blog to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK