In this podcast, Seth discussed the trap of false correlation that we often fall into and not realizing.
We fall into the trap of false correlation when we choose to look at two correlated events but fail to understand whether there is a cause that links to the correlation.
One of Abraham Wald’s research projects taught us an important lesson on the survivors’ bias. When we look at only the data available to us, we might miss an even bigger picture if we overlook the data that might not be visible to us. The survivors do not have the problem, so we need to look at the ones that do.
If we base our decisions purely on correlating the data, we might put too much credit on the idea of spurious correlation. Attempting to understand the causation is important because we cannot make good choices going forward if we are simply modeling data. This distinction becomes critically important when we begin to scale up artificial intelligence and machine learning.
AI has made great strides in solving many problems, such as image recognition or process automation. Those problems lend themselves well to computer processing because there is a definitive correct answer. If we feed the system enough data with the right answers, the computer can begin to predict and match up to the right answers over time.
For many other problems, such as predicting social outcomes, they are still difficult problems for AI. For those problems, the best AI can do is processing the data and offering a probability for prediction.
But we can fall into a trap if we act like statistics and probability are truth. Statistics simply tells us the range of what we can expect to happen, not why or how it will happen in any given moment. What we need is true understanding.
After Abraham Wald died, Ronald Fisher, one of the other great statisticians of the 20th century, attacked his work. Fisher criticized Wald’s work on the design of experiments, alleging ignorance of the basic ideas of the subject. Other scholars had subsequently defended Wald’s work.
Towards the latter part of his life in 1950, Ronald Fisher made a tragic error in that he spoke out against a UNESCO study that showed that people of all different races and backgrounds had the potential to do any sort of work. Fisher confused because he was looking at false correlation, not at understanding why. Fisher believed that evidence and everyday experience showed that human groups differ profoundly “in their innate capacity for intellectual and emotional development.”
Correlation is not causation, and numerous factors influence whether something is going to happen. The hard work of statistics is not to do a test and not to make sure we have the right sample size. The hard work of statistics is to understand the truth.
If we cannot understand, we are going to get seduced by the seemingly accurate predictions of artificial intelligence. The data might seduce us into believing that the future of something is going to look like the past of something. We just might write off populations of people simply because, in the past, other factors prevented them from doing the work. We can do better than this.