International Organization for Standardization (ISO) standard 42010 defines architecture as “fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution.” The many forms of architecture that are present today — whether digital, information, enterprise, business, or data — focus on a specific aspect of these fundamental concepts. Most architectures are design-oriented and, if you are lucky, they will guide you in evolving the IT landscape to keep up with the changes in the business IT supports.
But what exactly is a data architecture? I have asked myself this question many times over the past 20 years. What I observe is that data architects are busy designing systems that replace existing systems. The users perceive all kinds of problems with the information they get or have to record. They ask the data architect to rectify the problem, and the architect ends up replacing the existing solution with something new, in accordance with the latest fashion — or should I say fad — in data land. The end result is that the users’ problems are not solved, probably because the solution does not address the main cause of the problems in the first place.
Why is it so hard to design adequate data solutions and have the solutions evolve instead of replacing them over and over again? Over the years, I have started to see patterns across the different projects and businesses on and for which I have worked. One pattern is reliance on technology to solve data use problems. You could either see this as techno-optimism or as avoiding facing the real problems. The root cause of these problems is found in a lack of knowledge of how humans use information delivered by IT systems in their decision-making processes, as Cutter Consortium Senior Consultant Barry Devlin addressed in his article in a recent issue of Cutter Business Technology Journal (CBTJ).
The technology industry, which has been promising technology solutions for organizational problems, has jumped into this gap. One of the rationales for the focus on technology is actual technological progress. We have new data sources available, and the industry has grown tremendously with techniques to produce, aggregate, and analyze different types of data on a massive scale — what was beyond our imagination 20 years ago. Despite these capabilities, we haven’t progressed much in understanding how we humans deal with information, both in supporting business processes at an operational level and in using that data to create massive automated aggregation and analytical flows to guide participants in the business process.
Today, we use data to digitize more and more business processes, and with digitization we produce even more data that we try to leverage to our advantage. As Adrian Jones points out another CBTJ article, this increase in volume, veracity, and velocity leads us to vulnerability. Yet current data architecture practice fails to guide us in where we run risks and where we need to step up and organize ourselves to mitigate vulnerability. The disconnect between what a data architecture is in most organizations (a technology-focused roadmap) and what an organization needs from a data architecture (guidance on how to create a valuable and sustainable data landscape) has puzzled me for a long time.
Ever since the famous The Economist cover page a few years ago with the title “The World’s Most Valuable Resource” — depicting drilling rigs with the logos of Google, Facebook, and Amazon — people have misinterpreted that statement as saying that data is a resource, like oil. But data is not a resource. Data is the breadcrumb trail of human activities. Like a true breadcrumb trail, it indicates the activity but never fully describes it. Data doesn’t fall out of the sky like manna from heaven; it cannot be mined like cobalt, either. Data is a residue of activity, and when you think about data in this way, you can envision a data management practice where you start to fully focus on the activities that bring you value and, therefore, the data you need to collect. All other data can be ignored; it costs you money to acquire and store, but it won’t bring you added value.
Data architecture should help you decide which activities are worth pursuing. The real challenge is that this is a dynamic system of loosely coupled activities. The volatility of changes in such a system varies in the course of time. Thus, a data architecture should be your guide on how to navigate these changes, which demands a lot from the interaction between data architects and the users of information. How should a data architecture support you?
For a data architecture to be an effective guide to how to manage data, it must be aware of the way humans use data. Even if we automate data aggregation, we always have to do so with respect for the use and for the fallibility of data and to guard against our own lack of understanding. The effect of relying on data in decision making is not without consequences. Virginia Eubanks has studied the use of data in social domains and has written a clear warning of how we fail people by deploying data solutions and algorithms. Though the examples she gives are related to injustices introduced into social support systems through relying on data and algorithms, the underlying mechanism in our business decisions is the same. We trust data-based decisions more than human knowledge-based decisions, even when we know that the data is flawed. I have yet to encounter a data architecture that accounts for the consequences of the use of data or has stern requirements regarding the consequences of data use. If mentioned at all, consequences are often viewed as a risk to be mitigated and not regarded as a fundamental property of a data architecture.
Creating an all-encompassing data architecture is not easy. A single architect is unable to understand all possible aspects and consequences. If you accept this and you are aware that data is the result of human activity — reflecting all the quirks we humans have in our behaviors — the answer is to create a framework where the value of using information is continuously evaluated against requirements derived from intended use. This evaluation leads to a continuous adjustment of both the way you organize the use of data as well as the solutions that support that use. The technology roadmap is the result, with new technological capabilities an input to evaluate whether you should add better or new solutions to the landscape.
I call this framework the connected architecture (see Figure 1). The word “connected” emphasizes the necessary connection between different people, with different skills, who need to collaborate in acquiring, sorting, storing, modeling, processing, analyzing, interpreting, drawing conclusions, and taking action on the conclusions. You need to organize these connections; it won’t happen by itself. It is a continuous fight against data entropy, the scattering of copies of unmaintained data that result from local, one-time analysis and use. Data entropy is one of the plagues that visit many organizations in managing their data collections and in deriving value from them.
If architecture is indeed, as ISO standard 42010 defines it, the “fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution,” then connected architecture puts the environment in which data is recorded and used at the front of the stage — an environment consisting of people creating and using data.
[For more from the author and others on this topic, see the CBTJ issue, “Data Architecture Is Really About People.”]