The Critical Need for Data Governance — An Introduction
Until perhaps a decade ago, the two concerns IT managers had about information (or data — let’s treat the two terms as synonymous for now) were its exponentially growing volume and the need to secure it against attackers. So we designed some solutions to address these problems: cloud storage to ensure scalability, along with firewalls and intrusion detection systems to enhance security (or, more accurately, to allow us to barely stay ahead of the hackers). We practiced data management (or, even more narrowly, database management), focusing on normalization techniques and master data management to avoid redundancy and inconsistency. But rarely was there an enterprise strategy addressing the risks posed by the sensitivity of the data, and there was little pressure from the public or from authorities to demand it.
The situation today is very different, and concerns around data have escalated. E-commerce, mobile devices, and social networks have led a couple billion people to share an increasing amount of personal data online. Consumer-oriented Internet of Things (IoT) technology allows for much more information to be captured about what we do and where we are doing it. You know the rest of this still-evolving story of increasing amounts of personal data and the related data privacy issues: the revelations about US National Security Agency surveillance; the Target and Equifax (and many more) data breaches; the collapse of the EU-US Safe Harbor provision;1 the emergence of the General Data Protection Regulation (GDPR), stricter and more punitive than the Data Protection Directive it replaced; the evidence of US election manipulation attempts through well-targeted, fake political advertising; Facebook CEO Mark Zuckerberg’s uneasy testimonies in front of US and European legislative bodies — and this is surely not the end.
On the corporate side, Industrial IoT is also fostering the collection of massive amounts of data, with resulting opportunities and risks. Digital transformation is based on the use of this data to at minimum improve processes and at best disrupt and reinvent an enterprise’s business model based on selling the data — or services based on it. The risk is that what used to be disconnected silos of automation are now mainstream systems connected to the enterprise network and to the Internet, open to hacking by all sorts of ill-intentioned actors. There are myriad potential misuses of IoT data, from industrial spying to electrical grid disruption to analyzing your smart meter data to determine that you are away from home. The control systems teams responsible for this “operation technology” in the lines of business or in manufacturing should be working hand in hand with the IT organization, but the two groups have ignored and mistrusted each other for decades. Consequently, major organizational changes will be required to extend data governance practices to the stewardship of IoT data.
If that wasn’t enough, we have also started seeing the serious impact of data residency laws and regulations on the movement and storage of data. For example, Microsoft was enjoined by US authorities (and refused, until a recent resolution) to hand over data located on a server overseas; LinkedIn and Telegram are banned in Russia because they do not store their data on Russian territory; and companies like Amazon and Microsoft have had to multiply regional data centers to assure international clients that their data would not leave their countries. Prohibitions against international movement of data (independent of potential security and privacy concerns) are expensive and can stifle business growth. Moreover, companies that ignore these laws are at risk of serious penalties. In past presentations, I’ve summarized the data residency issue by paraphrasing the opening line of a nightly TV news program from the 1970s: “It’s 10 pm, do you know where your data is?”
With the convergence of all these concerns (security, privacy, data residency), the need for information governance becomes clear and increasingly urgent. The big questions for companies (and some of these apply to regulators as well) are:
Does the organization even know what data it has, how sensitive it is, where it is located (especially when cloud services or managed hosting services are used), and whether the organization actually owns the data or is simply its custodian?
Who is in charge of the data — and who should be? It can’t just be “a cross-functional team of people from IT, engineering, and legal,” as some companies answered in a 2017 survey on data residency issued by the Object Management Group (OMG): if everyone is in charge, no one really is. One of these days, corporations will face the same issue around all their data as they did when financial reporting regulations got strengthened: who goes to jail if there is a damaging data breach due to negligence? Organizations must decide who is responsible, accountable, consulted, and informed (RACI) about data content, location, and access rights.
What should the policies, processes, and best practices for data governance be? In other words, what does “data governance” mean, and how do you put it in place?
Which of the hundreds of national, regional, or supranational laws, regulations, commerce treaties, and so on, apply to certain data?
Is governance limited to putting in place a good defense against security, privacy, and residency risks, or should it also cover what the organization can do with its data on the offensive side? Does governance include finding new uses of the data to modernize or transform your organization? And again, who is responsible for leading the charge?
Some have argued (and they deserve credit for their early focus on the importance of data as an asset) that these new realities require a new role: the chief data officer (CDO). But this begs the question: if chief information officers are not responsible for the data, then what are CIOs for? My colleague, Cutter Consortium Fellow Steve Andriole, gave the most common, and very sad, answer to this question, when he said that many organizations have diminished the role of their CIO to that of a “cheap infrastructure operator,” mostly because of the relentless demand to reduce IT costs. This has made the CIO effectively unable to put in place an information strategy and certainly not one that would encompass governance. Are we naively trying to compensate for that failure by giving the newly identified responsibility for data governance to a new role (the CDO)? And if the CIO failed at data protection or was not given the resources to really govern data as a corporate asset, why should someone else — with a virtually identical title and an overlapping mission that is likely to create organization confusion — succeed?
While issues around data and information governance are starting to get the attention they deserve, business and technology leaders still need help finding their way through all the conflicting demands. We invited several authors to present their perspectives and recommendations on this complex web of issues. We hoped for a wide range of ideas, and we were not disappointed.
In This Issue
We lead off with an article from Nick Stavros, Ian Stavros, and Bryan Turek, who give us the complete context of what governance means, considering the data lifecycle (create, store, use, etc.) and the cognitive hierarchy of data, information, knowledge, understanding, and wisdom. The authors also look at the elements contained in a formal data model and what these elements tell us about the governance actions that need to be taken when data is accessed, modified, or deleted. In their discussion of temporal data governance, they consider what happens to data over time and look at how one aspect of temporal relationships, the preservation of data history, can be addressed using distributed ledger technology such as blockchain. After touching on data residency (i.e., geographic data governance), they look into the future, including the need for data governance standards and ontologies.
Next, Steven Woodward explores in depth the issues of data residency, using the term “geo-jurisdictions” to describe the intersection of geographical and legal boundaries that place constraints on the handling of data. Woodward begins by alerting organizations that falsely believe that data residency is not a concern for them, pointing out, for example, that the GDPR applies to non-European entities that hold data pertaining to EU citizens. From this warning, the author moves on to concrete recommendations about policies that should be put in place for various service and deployment models as well as the need for a thorough geo-jurisdiction analysis. Woodward also discusses applicable standards, leveraging his own extensive participation on International Organization for Standardization (ISO) committees.
James Denford, Kevin Desouza, and Gregory Dawson focus their article on a fundamental organizational question: in a medium-to-large organization, should data governance be centralized or decentralized (or, possibly, federated)? There are pros and cons for both centralization and decentralization. The overall business strategy needs to be considered: in some conglomerates of disparate business lines, there may be little commonality to the information being managed by the various divisions. However, decentralization still causes duplication of effort and risks inconsistencies across the enterprise. The authors give concrete examples that link the IT governance modality — centralized or decentralized — with performance outcomes. They generally favor a centralized model and provide the reader with specific recommendations on how to centralize data governance in organizations and how to implement this model successfully.
In the next article, Michael Atkin also takes on the organization aspect, from the perspective of the role of the CDO in the current context. He specifically points out the need to play offense as well as defense; that is, to use the data to support innovation rather than focusing only on controlling the data and protecting it from mishandling. Atkin depicts the conflicting demands on the CDO, who must cover “operational data management” as well as “data management for analytical insight.” He shows how caring for the quality of the data, understanding its provenance and pedigree, minimizing the transformations, and adding semantic understanding of the data are part of the new responsibilities of the CDO 2.0.
Finally, Yves Vanderbeken, Tim Huygh, Anant Joshi, and Steven De Haes describe what good governance means for public sector institutions that are embracing open data initiatives. While these organizations make data accessible to increase government transparency and promote economic empowerment, they face additional responsibilities in terms of data quality, privacy compliance, security, and more. Often, publishing data is a new task for which their staff is not prepared, and it increases the number of users by orders of magnitude. The authors explain how to define a strategy, set a baseline, create a data governance model, and evaluate its effectiveness. They conclude with a specific set of recommendations, most of which can certainly apply outside the public sector.
These five articles will certainly not close the debate — we seem to be only at the beginning of an awakening to the responsibilities implied by the term “data governance.” What is starting to be clear is that executives and managers who ignore this subject do so at their own peril. As is usual in all IT-related subjects, a tool is not a magic wand, regardless of what a vendor might tell you. Instead, placing your data under good stewardship, and making it do new and good things for you, requires hard work and attention to four pieces of the puzzle: people, process, content, and (finally) technology. This issue’s authors will help lead you on the path.
This Cutter Business Technology Journal issue, Guest Edited by Claude Baudoin, is available in the Cutter Bookstore.
Cutter members: Access here.
1 In 1995, the EU had adopted a Data Protection Directive (now superseded by GDPR), a guideline to EU member countries to formulate their own regulations. To alleviate the burden on IT service providers that stored data pertaining to EU citizens, the July 2000 Safe Harbor decision allowed those providers to self-certify their compliance with the directive. In October 2015, the EU Court of Justice revoked that allowance, explicitly citing the fact that US authorities, such as the NSA, gave themselves broad rights, incompatible with the directive, to obtain access to data. In 2016, a new EU-US Privacy Shield was signed to replace the defunct Safe Harbor agreement.