Bayes Theorem

In this article, let us try to understand Bayes Theorem.
This article has been inspired by the two videos provided in References section. The illustrations used here are my own.

Bayes theorem helps us draw inferences from data. It also challenges our beliefs which could be often biased.
Let us say that we came across a group of athletes from many countries with the athletes from top 2 or 3 popular sports from those countries. 
We will consider a group of athletes who play either soccer or basketball.

The height of one of the athletes is more than 7 feet. What do you think this athlete plays? 
Soccer or Basketball.

Our intuition definitely says that he must be playing basketball.
Now, let us do the math.

The most popular sport played across the globe is soccer (No offense to Basketball!). Let us say we have total of 100 athletes and 90% of them play soccer and 10% play basketball.
That means 90 athletes play soccer.

Now let us see how many of soccer players are more than 7 feet tall. We might have 20% soccer players who are more than 7 feet tall.

Now let us see how many of basketball players are more than 7 feet tall. Given the nature of the game, we might have 80% of basketball players who are more than 7 feet tall. 

The distribution is shown in the below diagram. 

Clearly, we are trying to find out if the player in question lies in Area 1 or in Area 2.
If we compare the areas, obviously area 2 is larger than area 2, which means the player must be playing soccer.

Why did our intuition go wrong. It went wrong because, we discounted the fact that there are more number of soccer players in the group than basketball players.
So, our chances that we run into a soccer player is 9 times more than the chance that we run into a basketball player.
This concept is fundamental to Bayesian Theorem.

Putting this in form of an equation:

So the probability that the player is a Soccer player, given that he is more than 7 feet tall should be calculated by multiplying the Area 2 by the probability of total number of soccer players and dividing by total probability of players who are more than 7 feet tall.

On similar lines:

We can see that the probability of the player who is more than 7 feet tall being a soccer player is higher. The above formula whose intuition we know now, is known as Bayes Theorem. Bayes theorem is used very heavily in Machine Learning and in Robotics.

Another aspect of Bayes theorem is the prior and posterior probabilities. Prior probability is the probability before we measure something and whose knowledge is assumed. Posterior probability is the probability after measuring something.

In the above example, probability of Soccer players is prior probability and the probability of a player being a soccer player given that is more than 7 feet tall could be considered as posterior probability.




Post a Comment

Popular posts from this blog

Pivotal Cloud Foundry (PCF) Integration with Elastic Cloud Storage (ECS)

Restful code example using Spring MVC

Spring Integration - Bulk processing Example