How Increasingly Difficult It Can Be To Prove Causation vs Correlation

Following up from the post ‘How the Van der Waerden Theorem Shows the Limits of Big Data‘, since Big Data will produce an increasing number of spurious correlations, the issue of identifying causation versus correlation will become increasingly important.

This Medium article ‘Understanding Causality and Big Data: Complexities, Challenges, and Tradeoffs‘ does a good work to explain the issues at stake. It also explains in a clear manner when causation is really needed, and when correlation is sufficient.

The most important in my view is that with the increasing complexity of our world (directly inherited by our increasing linkage), proving causation will become increasingly difficult. It does not help that we are trying to increasingly derive causation from smaller effects, which are on the border of being statistically significant. The causation chain can have some very indirect links that will make it difficult to determine what is causing what. I believe the current debates about the effect of certain chemicals used in natural environment (such as pesticides) exactly demonstrates this issue: in a complex ecosystem, proving a causation link is very difficult even if there is correlation.

Substantial theoretical and practical progress in the methods to determine causation is an important issue for the world today. I hope that enough focus and effort is dedicated to this problem.

Share