How Much Data is Too Much Data To Mine?
Consumers want to know when they’re being watched, and have changing expectations about privacy and data mining.
Competing With Data & Analytics

Consumers want to know when they’re being watched, and have changing expectations about privacy and data mining.
“Eye” by artist Tony Tasset in Laumeier Sculpture Park, St. Louis, MO. Image courtesy of Flickr user warmestregards.
A recent article in PC Advisor, “Equifax Eyes Are Watching You — Big Data Means Big Brother,” discusses the staggering amount of data credit reporting giant Equifax gathers on individuals.
Equifax collects details on 500 million consumers and 81 million businesses in 17 countries. Included are magazine subscriptions, rental history, real estate assets, investment wealth, retail purchasing habits, criminal records, debt-to-income ratios, DMV files, post office boxes, and more. The company slices, dices, analyzes and indexes 800 billion records into 26 petabytes of data, according to PC Advisor.
While impressive, the sheer amount of information Equifax and other big data purveyors collect and utilize leads to the question: How much data is too much data to mine?
In other words, where is the line in the sand with regard to consumer privacy that organizations need to be aware of when pursuing big data analytics projects?
Equifax, financials services companies and healthcare organizations all fit into a unique category when it comes to data mining: they’re regulated. A patchwork of existing laws including The Gramm-Leach-Bliley Act, the Fair Credit Reporting Act and HIPPA govern how these industries use, share and sell information.
But if your business is outside of the realm of regulation, it’s the Wild West. Anything goes. Because data collected and disseminated for marketing purposes is essentially unregulated.
One thing that’s clear is that old methods of anonymizing data are no longer adequate. According to a recent Stanford Law Review post, “Privacy in the Age of Big Data: A Time For Big Decisions,” anonymized data can, in fact, be re-identified and attributed to individuals. “The implications for government and businesses can be stark, given that de-identification has become a key component of numerous business models, most notably in the contexts of health data (regarding clinical trials, for example), online behavioral advertising, and cloud computing,” it notes.
So what is a big data executive to do? Boris Segalis, a partner with