Correlation is not Causation
Association vs Causation
Last updated
Association vs Causation
Last updated
"Correlation does not equal Causation" or "Correlation is not Causation" - All these phrases are used quite often in the field of AI.
Correlation: It is the statistical measure that defines the size and direction of a relationship between two variables. It does not tell us if the change in one would cause a change in the other. correlation states that change in one variable can help predict the change int the other variable.
“A bunch of variables that vary together for a very long time, or vary cohesively is correlation”- Judea Perl
Example: If X increase Y increase (Positive Correlation) or if X increases Y decreases(Negative Correlation)
Causation: It is the causal relationship or the cause & effect that relates two variables. Changes in one variable would cause a change in the other one. One variable can make the other happen.
Example: X is causing a change in Y (X -> Y) or Y is causing a change in X (Y <- X)
Some other helpful examples which make the difference clear :
Smoking causes an increase in the risk of developing lung cancer. Both are causally related. Whereas, Smoking can be correlated with alcoholism, whereas smoking does not cause alcoholism. It is difficult to define a cause and effect, whereas developing a correlation is not hard.
In 2012, Misserli showed a strong positive relationship between chocolate consumption and the number of Nobel laureates per country. These 2 are highly correlated, but not to mistake correlation with causation. "Eating chocolates does not produce Nobel Prize winners"
Correlation doesn't tell us why and how behind the relationship, just says the relationship exists. It could be due to coincidences, One variable influencing another doesn't mean it's causing the other one. Example: Ice cream sales is correlated with homicides in New York (Study). Although there might be a correlation, it doesn't mean one causes another. Sometimes it is very easy for such spurious correlations to arise within the data.
A very good website called Spurious Correlations by Tyler Vigen. Where public data has been used to point out some unexpected and funny correlations with their plots.
Example from Uber:
In the Picture Above: Number of Eats Order is a common cause for "Experiences of delayed delivery" and "Customer Engagement" which can lead to a spurious correlation between these two.
Note: There are times when we are unaware of such spurious correlations. A lot of times researchers have come to a conclusion based on a correlation, claiming some spurious correlations to be causal relation. Its believed to be true until a counter-argument or counter-evidence is provided to disapprove the claim. These things happen quite often, but that's not how logic works, there has to be trustable evidence to prove such causal relationships. RCT is one such method.
Sometimes we just believe something because of the absence of proof. For a long period of time, it was believed that earth is flat, until the time when technology became more advanced and we could prove that it's spherical!!!
Even the well-trained scientists misinterpreted correlation and causation. In the 1950’s some statisticians got confused with tobacco causing cancer. They argued that without a randomized experiment of comparing smokers with nonsmokers, this could not be established. Eventually, the causal relationship was established between tobacco and cancer. The point is that such problems exist.
Correlation can not be considered as Causation because there can be many other factors that could be at play.
When there is Causation there is surely a Correlation but when there is a Correlation there may or may not be causation.
There can be single or multiple common causes of causing both X and Y, which is ultimately leading to changes in both of these.