When Good Data Goes Bad, Part I

Posted March 2, 2021 in Data Analytics & Digital Technologies
Barry Devlin

Data doesn’t really “go” bad, of course. At least, not in the way that leftovers in the refrigerator do. Sometimes data just starts off bad and gets worse. Other times, it’s people or processes that do bad things to data or with it. In contrast to all the articles that celebrate data goodness, in this series we’ll point out what can go wrong with data — and what to do about it.

“Don’t be evil” was Google’s original motto, now deprecated like some past feature of a previous version of corporate software. As the quintessential data company, Google offers a good starting point for an exploration of data at its worst and, indeed, its best. Like many quotable mottoes, its origins are disputed but its most memorable moment came in Google’s founders’ letter for its 2004 IPO, “Don’t be evil. We believe strongly that in the long term, we will be better served — as shareholders and in all other ways — by a company that does good things for the world even if we forgo some short-term gains.”

The problem for Google and, since 2015, parent Alphabet is that, in the eyes of many observers, its use of data strays far from good. Larry Page and Sergey Brin’s lofty IPO ideals — “[our search results] are unbiased and objective, and we do not accept payment for them or for inclusion or more frequent updating” — may be well met in principle. However, they hide a deeper philosophical dilemma for a company whose major source of income is advertising and whose unique selling proposition (USP) is that it knows its audience far better than any other company. That USP comes courtesy of the treasure trove of behavioral data it has amassed on the vast majority of Internet users.

The Extractive Imperative

As big data proliferated, the phrase “data is the new oil” became wildly popular with marketing executives. Bernard Marr explains the many ways in which the phrase is “lazy and inaccurate” in his 2018 article, while admitting it’s also a good analogy: “it’s easy to draw parallels due to the way information (data) is used to power much of the transformative technology we see today.” However, as the climate emergency has grown and oil has come to be seen as one of the main culprits, a darker side of the analogy emerges. The drive to squeeze every last, viable drop of oil from every reservoir and tar sand has resulted in enormous climate and environmental impacts. This profit-driven extractive imperative means that natural resource extraction must continue and expand regardless of changing circumstances, ultimately undermining civilization, and the planet.

Now, in the 21st century, this same extractive imperative has been applied to data, with Google as its first and major proponent, followed by Facebook and others such as Amazon, Oracle, Apple, and Microsoft. Behavioral data, mined from every online and, increasingly, offline interaction and transaction, has become the main, mandatory raw resource for such businesses.

The old saw that “if you’re not the paying customer, you’re the product” is, in fact, too generous. You are, in reality, the pay dirt from which is extracted the raw resource that drives industrial-scale prediction and prescription of your own future behavior for the real customers of Google and its cohorts: advertisers. Scary.

Google’s original rationale for collecting and analyzing search behaviors was to benefit the users of its search product by delivering intuitively correct results. Its subsequent expansion of collection efforts emerged from the realization that its main and, perhaps, only business was as an advertisement broker, enabling it to launch itself as a very profitable public company. What better illustration of good data gone bad?

And yet, good versus bad is not always so clear. In the outrage — of some of us — about the alleged use of Facebook data (via Cambridge Analytica) by Donald Trump and the Brexit campaigns in 2016, it is often forgotten that analytics of vast troves of personal data, supported by Google CEO Eric Schmidt, was credited as a major contribution to Barack Obama’s success in 2008 and 2012.

Surveillance Capitalism

Nuance aside, the disturbing implication of the extractive imperative as applied to data are summed up in Shoshana Zuboff’s The Age of Surveillance Capitalism:

Just as industrial civilization flourished at the expense of nature and now threatens to cost us the Earth, an information civilization shaped by surveillance capitalism and its new instrumentarian power will thrive at the expense of human nature and will threaten to cost us our humanity. The industrial legacy of climate chaos fills us with dismay, remorse, and fear. As surveillance capitalism becomes the dominant form of information capitalism in our time, what fresh legacy of damage and regret will be mourned by future generations?... This mobilization [of surveillance] and the resistance it engenders will define a key battleground upon which the possibility of a human future at the new frontier of power will be contested.

Surveillance capitalism is founded on a belief in an unfettered and unregulated freedom to collect data at will and analyze it in breadth and depth, far beyond the understanding and (real) consent of the people who provide it, mostly unwittingly. Such pervasive data collection, analytics, and prediction drives this new form of capitalism, incapable of being understood within previous models. The insidious danger is that it transforms the market and, indeed, society into a fully asymmetrical information environment, where data gatherers know everything about the individual, who knows little or nothing about them. This offers a small elite — both political and commercial — unprecedented power to direct public attention and action with near-total certainty of the outcomes. Zuboff asserts that we are seeing worldwide, independent attempts to achieve these ends.

When Good Data Goes Bad: Conclusion I

The data didn’t go bad, but the use to which it was put was dramatically changed and that new use drove the collection of even more categories and volumes of personal behavior data. In any digital transformation program, the lesson is to define early on the purpose of your data collection, consider and approve its ethics, and then strongly resist any temptation to expand the scope of use without a comprehensive ethical review.

About The Author
Barry Devlin
Dr. Barry Devlin is a Senior Consultant with Cutter’s Data Analytics and Digital Technologies practice and an expert in all aspects of data architecture, including data warehousing, data preparation, analytics, and information management. He is dedicated to moving business beyond mere intelligence toward real insight and innovation, complementing technological acumen from informational, operational, and collaborative fields with a deep focus on… Read More