Following his successful US Presidential and Congressional election predictions and the success of his book, The Signal and the Noise: Why So Many Predictions Fail—but Some Don’t, Nate Silver has gained significant admiration and popularity in the last few years. Silver’s 2012 bestseller highlighted important and disruptive implications for anybody trying to understand markets, customers, and brands, especially for anybody who is looking to make predictions based on data.
A simplistic interpretation of Silver’s work would be to say that:
- Big data can solve everything.
- There is no need to collect research data, just use what’s out there.
- The future is predictable, if we have enough data.
Many accept these interpretations of Silver’s work, but how accurate are they? The best way of assessing them is to listen to Silver himself. In the 544 pages of his book (and as I outline in the Vision Critical University white paper The Signal and the Noise: Lessons for Marketers, Insight Professionals and Users of Big Data), Silver says that in fact all three of these messages are wrong!
So, what is Silver actually saying to insight professionals in The Signal and the Noise? Here are three key messages:
- Consider the limitations of Big Data.
Nobody, including Silver, is saying that Big Data has no uses. Indeed, Silver’s election forecasts are an application of multiple data sources, as are his predictions of baseball teams’ successes and failures. But he also adds a note of caution, saying “our predictions may be more prone to failure in the era of Big Data.”
Silver’s first concern is that in the era of Big Data, noise (defined as unhelpful, irrelevant, and possibly misleading information) is growing faster than the signal. If there is “an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate.” With the noise growing faster than the signal, messages will become harder to find, not easier.
Secondly, the sheer scale of Big Data will make people think that the old rules, the rules of ordinary data, no longer apply. However, Silver points out that when there are an almost infinite number of possible connections, the keys to success are principles based on prior knowledge, theory, and experiments. Consider the case of climate change. While simple models of global warming have been largely accurate, attempts to predict the impact of climate change on specific regions, countries, and particularly cities, have been much less successful. More data shouldn’t mean more complex models.
- Do not expect to predict everything using Big Data.
Many things simply cannot be accurately and reliably forecast. Silver highlights four key components in lack of predictability:
- Chaotic systems are ones where a tiny change in the input data can result in a massive change in the outcome event, the so called ‘butterfly effect’.
- Missing data are those things that are not being measured. For example, a project might collect a respondent’s location through every moment of the day, their online connections, their purchases, and their exposure to advertising. But doing so may not accurately predict their actual behaviour since that will also depend on factors not measured, such as their childhood experiences, their genes, conversations overheard, behavior seen etc.
- Extrapolation is when data is collected for one range or area and then the results are forecast for some other range or area.
- Feedback loops happen when the cause and the outcome become correlated with each other, removing a clear cause and effect, and removing the ability of researchers to find enduring ‘laws’ to govern the market.
While many Big Data fans may take offence to the claim that not everything can be predicted, the components that Silver outlined in his book (and which are explored in our White Paper) underline the difficulty of accurately forecasting things based exclusively on data that’s ‘already out there’.
- Combine insights from humans and machines.
While Silver highlights many cases where models and computers do a really good job of predicting the future, he also shows that when people are combined with models, the result can be even better. For example, Silver looks at the chess tournament between Kasparov and IBM’s Deep Blue computer program. The big story is that this was the first time a machine beat a world champion, but an interesting footnote is that the software team worked on the program between matches to add what they had learned to what the machine was able to determine. Kasparaov was beaten by a combination of machine and people.
In marketing, people who are combining market research with Big Data appear to be doing better than those who are just relying on Big Data. At the 2013 MRS Conference in the UK, Lucien Bowater, BskyB’s Director of Strategy and Insight, talked about how he uses market research to show where to dig, and the Big Data scientists then do the digging.
Bowater’s insight stresses that for Big Data projects to be successful, they need to add consumer insights into the mix. What your customers tell you in your insight communities (where you can do longitudinal studies to find patterns, build and test hypotheses, and have discussions with members to explore the whys) can be a good starting point to figure out where Big Data can help connect the dots.
To conclude, The Signal and the Noise shows how in a world of Big Data, there will be millions of meaningless patterns in the data, the results of pure chance. Silver shows how Big Data will, in many cases, make it harder to determine what is really going on, and what is causing what.
I encourage you to read The Signal and the Noise, but if you’d like a quicker introduction to the key points, including the application of the Bayesian approach, from a marketing and insights point of view, you can check out my VCU white paper.
Photo credit: Randy Stewart