23

Thinking Probabilistically — Fundamentals

 3 years ago
source link: https://towardsdatascience.com/thinking-probabilistically-fundamentals-da956e5ca077?gi=7dcc98b292c9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Science and engineering have seen amazing progress over the last few centuries. We are now able to launch a spacecraft from Earth and predict it will arrive on Mars at a certain time and location. However, it looks like not everything is as easy to predict as the trajectory of a spacecraft.

Take tossing a coin, for instance — as ridiculous as it may sound, we’re not able to predict with certainty whether the coin is going to land on heads or tails. And that’s because a coin toss is a very complex phenomenon. The outcome depends on multiple factors — the strength and the angle of the toss, the landing angle, the surface the coin lands on, etc.

uURrQrv.png!web

Designed using Canva

Although we can’t tell beforehand the outcome of a coin toss, we’re able to at least estimate the probability (the chances) of a coin landing on heads or tails. This may sound like a limitation, and in a way it is, but estimating probabilities is an extremely powerful technique that can enable us to build non-trivial applications, including:

  • Image recognition systems (used for self-driving cars, medical diagnosis, etc.)
  • Spam filters for inboxes.
  • Statistical hypothesis tests.

Terminology

Whenever we can’t predict outcomes with certainty, we’re dealing with a random experiment .

The toss of a coin is a random experiment, just like drawing lottery numbers or rolling a die. The term “experiment” might make you think about science, but the term here has a wider meaning — a random experiment is any process for which we can’t predict outcomes with certainty.

An outcome is any result that a random experiment may terminate in. For instance, all the possible outcomes of rolling a six-sided die are 1, 2, 3, 4, 5, and 6.

Although we can’t predict the outcome of a random experiment, we can at least estimate the probability (the chances) associated with its outcomes. A coin toss has two possible outcomes, and we can estimate the probability associated with the coin landing on heads or tails.

Generally, for any event E (like a coin landing heads up), we can find its probability by using the following formula:

When we calculate the probability of an event by performing an experiment one or more times, we calculate the experimental — or empirical probability — of the event.

Let us assume, we tossed a coin 300 times and found that P(H) = 46%. Then, we tossed a coin 5,000 times and found that P(H) = 51%. But if a different number of tosses give different probability values, then what’s the true value of P(H) ?

To answer this question, we’re going to do a thought experiment where we assume we already know that the true probability of P(H) is 50%. We’ll also assume the chances are the same for getting tails, so P(T) = 50%

Using these assumptions, we’re going to use Python to simulate a coin toss 10,000 times and watch how P(H) evolves as the number of tosses increases.

nuuQnyM.png!web

Above, we see that for the first 1,000 tosses or so, the value of P(H) varies a lot, with a maximum of 1.0 and a minimum of approximately 0.45. However, as the number of tosses increases, the value of P(H) tends to stabilize.

Interestingly enough, P(H) stabilizes around the true value of P(H) , which we assumed to be P(H) = 50% = 0.50. This suggests that the greater the number of coin tosses, the closer P(H) gets to the true value.

Now we understand that properly calculating empirical probabilities requires us to perform a random experiment many times, which may not always be feasible in practice. An easier way to estimate probabilities is to start with the assumption that the outcomes of a random experiment have equal chances of occurring . This allows us to use the following formula to calculate the probability of an event E :

When we calculate the probability of an event under the assumption that the outcomes have equal chances of occurring, we say that we’re calculating the theoretical probability of an event.

For instance, the total number of possible outcomes for a coin toss is two: heads or tails. Let H be the event that a coin lands on heads, and T the event that a coin lands on tails. We can use the formula above to find P(H) and P(T) :

P(H)=1/2=0.5

P(T)=1/2=0.5

Theoretical probabilities are much easier to calculate, but in practice, it doesn’t always make sense to assume the outcomes of a random experiment have equal chances of occurring. For example, in a cricket match, we are shown the probability of winning for both the teams. This is not always 50–50 and it keeps varying based on conditions, history, etc.

Take the event that we’ll get any number between 1 and 6 (both included) when rolling a six-sided die. The outcomes are 1, 2, 3, 4, 5, and 6, so there are 100% chances we’ll get some number between 1 and 6. Using our formula, however, we find the probability is only 16%:

P(number between 1 and 6)=1/6=0.16=16%

To fix this problem, we need to update the formula above to:

Permutations and Combinations

In English, we use the term Combination without thinking about the order. Example 1 — My fruit salad is a combination of Mangoes, Bananas, and Apples.

Example 2 — The combination of my safe code is 472.

In Example 1 the order is immaterial. Let it be banana first or apple first it is the same fruit salad. But in Example 2 the order is very crucial. 742 would be the wrong code to my safe.

When order matters it is Permutation.

When order doesn’t matter it is Combination.

When we have a group of n objects, but we’re taking only k objects,


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK