Ever seen a twitter-activity map? Then you must consider Big-Data bias...

Written Apr 4, 2013

Over the weekend of Apple’s April 3 release of the iPad, 73% of circulated tweets were favorable toward the iPad, but 26% expressed disappointment that the iPad could not replace the iPhone, according to a study.

If you’re not too careful, you could conclude from the above quote that general sentiment towards the iPad was largely favorable. But you would probably have made a biased conclusion.

Twitter Map

This is the point that Kate Crawford makes in a recent article, “The Hidden Biases in Big Data” (Harvard Business Review)*. With a data sample, it is always critical to ask whether the sample is representative of the target population, or not.

Thus, considering the iPad sentiment example, a key questions is: are the people who tweeted about Apple’s iPad over that weekend (the sample) representative of all the people who have, or even could have, interacted with the iPad during that time (the target population)?

Some excerpts from the article:

  • Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.

  • Data and data sets are not objective; they are creations of human design.

  • We get a much richer sense of the world when we ask people the why and the how, not just the “how many”.

*Reference

Kate Crawford, “The Hidden Biases in Big Data” (Harvard Business Review)