How the Van der Waerden Theorem Shows the Limits of Big Data

The Van der Waerden theorem states basically that if a string of data is long enough, there will always be instances of periodic occurrences. This means that when there is enough data, there will always be regularities – and they will not be meaningful, it is just a mathematical situation.

This theorem just means that for a big enough heap of data, we will find correlations that in fact do not have any meaning: these are spurious correlations.

Hence we can expect that with big enough data, Big Data analysis will throw up heaps of correlations that have no meaning at all.

But we can also expect that some people and organizations will take action based on those correlations, and that it may sometimes be deeply counter-productive.

Those who will have success in the world of bid data are those that will be able to sieve the many spurious correlations from the few real insights that can be gained from analysis. This will not be easy, because intuition may not be of great help. A thorough scientific analysis will be required, involving reproducibility of experiments in various independent data sources – and that will be difficult to do fully.

Let’s thus brace for many spurious correlations to be announced as discoveries only to be disproved some years later!

Share