Data Architecture Is Really About People — Opening Statement
CUTTER BUSINESS TECHNOLOGY JOURNAL VOL. 32, NO. 12
International Organization for Standardization (ISO) standard 42010 defines architecture as “fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution.”1 The many forms of architecture that are present today — whether digital, information, enterprise, business, or data — focus on a specific aspect of these fundamental concepts. Most architectures are design-oriented and, if you are lucky, they will guide you in evolving the IT landscape to keep up with the changes in the business IT supports.
But what exactly is a data architecture? I have asked myself this question many times over the past 20 years. What I observe is that data architects are busy designing systems that replace existing systems. The users perceive all kinds of problems with the information they get or have to record. They ask the data architect to rectify the problem, and the architect ends up replacing the existing solution with something new, in accordance with the latest fashion — or should I say fad — in data land. The end result is that the users’ problems are not solved, probably because the solution does not address the main cause of the problems in the first place.
Why is it so hard to design adequate data solutions and have the solutions evolve instead of replacing them over and over again? Over the years, I have started to see patterns across the different projects and businesses on and for which I have worked. One pattern is reliance on technology to solve data use problems. You could either see this as techno-optimism or as avoiding facing the real problems. The root cause of these problems is found in a lack of knowledge of how humans use information delivered by IT systems in their decision-making processes, as Cutter Consortium Senior Consultant Barry Devlin addresses in his article in this issue of Cutter Business Technology Journal (CBTJ).
The technology industry, which has been promising technology solutions for organizational problems, has jumped into this gap. One of the rationales for the focus on technology is actual technological progress. We have new data sources available, and the industry has grown tremendously with techniques to produce, aggregate, and analyze different types of data on a massive scale — what was beyond our imagination 20 years ago. Despite these capabilities, we haven’t progressed much in understanding how we humans deal with information, both in supporting business processes at an operational level and in using that data to create massive automated aggregation and analytical flows to guide participants in the business process.
Today, we use data to digitize more and more business processes, and with digitization we produce even more data that we try to leverage to our advantage. As Adrian Jones points out in his article, this increase in volume, veracity, and velocity leads us to vulnerability. Yet current data architecture practice fails to guide us in where we run risks and where we need to step up and organize ourselves to mitigate vulnerability. The disconnect between what a data architecture is in most organizations (a technology-focused roadmap) and what an organization needs from a data architecture (guidance on how to create a valuable and sustainable data landscape) has puzzled me for a long time.
Ever since the famous The Economist cover page a few years ago with the title “The World’s Most Valuable Resource” — depicting drilling rigs with the logos of Google, Facebook, and Amazon — people have misinterpreted that statement as saying that data is a resource, like oil.2 But data is not a resource. Data is the breadcrumb trail of human activities. Like a true breadcrumb trail, it indicates the activity but never fully describes it. Data doesn’t fall out of the sky like manna from heaven; it cannot be mined like cobalt, either. Data is a residue of activity, and when you think about data in this way, you can envision a data management practice where you start to fully focus on the activities that bring you value and, therefore, the data you need to collect. All other data can be ignored; it costs you money to acquire and store, but it won’t bring you added value.
Data architecture should help you decide which activities are worth pursuing. The real challenge is that this is a dynamic system of loosely coupled activities. The volatility of changes in such a system varies in the course of time. Thus, a data architecture should be your guide on how to navigate these changes, which demands a lot from the interaction between data architects and the users of information. How should a data architecture support you?
For a data architecture to be an effective guide to how to manage data, it must be aware of the way humans use data. Even if we automate data aggregation, we always have to do so with respect for the use and for the fallibility of data and to guard against our own lack of understanding. The effect of relying on data in decision making is not without consequences. Virginia Eubanks has studied the use of data in social domains and has written a clear warning of how we fail people by deploying data solutions and algorithms.3 Though the examples she gives are related to injustices introduced into social support systems through relying on data and algorithms, the underlying mechanism in our business decisions is the same. We trust data-based decisions more than human knowledge-based decisions, even when we know that the data is flawed. I have yet to encounter a data architecture that accounts for the consequences of the use of data or has stern requirements regarding the consequences of data use. If mentioned at all, consequences are often viewed as a risk to be mitigated and not regarded as a fundamental property of a data architecture.
Creating an all-encompassing data architecture is not easy. A single architect is unable to understand all possible aspects and consequences. If you accept this and you are aware that data is the result of human activity — reflecting all the quirks we humans have in our behaviors — the answer is to create a framework where the value of using information is continuously evaluated against requirements derived from intended use. This evaluation leads to a continuous adjustment of both the way you organize the use of data as well as the solutions that support that use. The technology roadmap is the result, with new technological capabilities an input to evaluate whether you should add better or new solutions to the landscape.
I call this framework the connected architecture (see Figure 1). The word “connected” emphasizes the necessary connection between different people, with different skills, who need to collaborate in acquiring, sorting, storing, modeling, processing, analyzing, interpreting, drawing conclusions, and taking action on the conclusions. You need to organize these connections; it won’t happen by itself. It is a continuous fight against data entropy, the scattering of copies of unmaintained data that result from local, one-time analysis and use. Data entropy is one of the plagues that visit many organizations in managing their data collections and in deriving value from them.
If architecture is indeed, as ISO standard 42010 defines it, the “fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution,” then connected architecture puts the environment in which data is recorded and used at the front of the stage — an environment consisting of people creating and using data.
In This Issue
Devlin sets up the theme of this issue of CBTJ by first taking us on a journey to help us understand how context plays a big role in using data. Known for creating the first data warehouse architecture, he proposed a new standard for data architecture for today’s world in 2013.4 Devlin puts context-setting information at the heart of all data architectures, and for good reason. In the drive to digitize more business processes, the intricacies of how all stakeholders interact with data have been underexposed. Though it is understandable that getting a grip on technology and reorganizing your business is hard enough, it is precisely this interaction that will determine your success. If you turn your perspective around, as he argues, your data architecture will be of more value.
So how do you put context-setting information at the center? In his article, Jones illustrates how to do this by putting Devlin’s architecture into practice. He stresses the importance of context-setting information by pointing out the increased vulnerability to which we are exposing ourselves. We produce and use more and more data. In the back of our minds, we know that data governance is of growing importance, but we don’t act in the right way on this knowledge. The problem with data governance is that it is never part of a data architecture but rather addressed as a separate process. If our data architectures are not aware of the vulnerability being introduced, accidents are just waiting to happen. Jones hands you the recipe for avoiding these accidents.
You could, however, go even further. In the corners of the Internet, a debate about the need for a “data-centric” paradigm has been ongoing among data professionals. Every day, working data professionals face the risks and hurdles of data spreading around and being duplicated in different states of maintenance within an organization. Consequently, most data architects are fully occupied fighting the entropy of data and are desperately trying to make users understand that they are dancing on a rumbling volcano. In the next article, Christian Kaul and Lars Rönnbäck explore what it means to adopt a data-centric paradigm. It certainly isn’t enough to have a data-centric data architecture; the implications are much more fundamental. The ultimate consequence is that you need to create a model-driven organization. By doing so, data architecture determines the shape of the organization, not the other way around. It’s a thought-provoking article, and best appreciated when keeping the context of the two previous articles in mind.
So do you now feel invigorated to rethink your data architecture? If so, you need to prepare yourself for doing what is within reach of the capabilities of your organization now and what will result in a more people-oriented data architecture going forward. In our final article, Sagar Gole and Vidyasagar Uddagiri help you understand which fundamental concepts — specifically, the six elements of an enterprise-wide data architecture — you should address today in order to “overcome challenges and leverage the opportunities and benefits of digital transformation.” They describe the “secret sauce” that prepares your organization to thrive during a digital transformation journey.
I hope that this issue of CBTJ will help you understand that a data architecture should be much more than merely a technology roadmap. To be of any value to people in an organization, the architecture should guide the people in an organization to an understanding of how to organize for ever-changing information requirements.
1”ISO/IEC/IEEE 42010:2011: Systems and Software Engineering — Architecture Description.” International Organization for Standardization (ISO), 2017.
2”The World’s Most Valuable Resource Is No Longer Oil, But Data.” The Economist, 6 May 2017.
3Eubanks, Virginia. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press, 2018.
4Devlin, Barry. Business unIntelligence. Technics Publications, LLC, 2013.