Article

Closing the Business-IT Gap with a Model-Driven Architecture

Posted January 17, 2020 | Leadership | Technology | Amplify
MDA
In this issue:

CUTTER BUSINESS TECHNOLOGY JOURNAL  VOL. 32, NO. 12
  

Every day, working data professionals face the risks and hurdles of data spreading around and being duplicated in different states of maintenance within an organization. Consequently, most data architects are fully occupied fighting the entropy of data and are desperately trying to make users understand that they are dancing on a rumbling volcano. Here, Christian Kaul and Lars Rönnbäck explore what it means to adopt a data-centric paradigm. It certainly isn’t enough to have a data-centric data architecture; the implications are much more fundamental. The ultimate consequence is that you need to create a model-driven organization. By doing so, data architecture determines the shape of the organization, not the other way around. It’s a thought-provoking article, and best appreciated when keeping the context of the two previous articles in mind.

You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.

— R. Buckminster Fuller, 1982

Many organizations today struggle with a strong disconnect between their business model and their IT systems, data distributed over a large number of nonintegrated IT systems, manual interfaces between incompatible applications, or difficulties with European Union (EU) General Data Protection Regulation (GDPR) compliance. We can trace most, if not all, of these issues back to an abundance of unspecific, inflexible, and non­aligned data models underlying the applications these organizations use to conduct their business. Often, these data models have been developed with insufficient business involvement and in isolation from each other — or have been purchased from vendors with little concern for the actual needs of the organization.

In this article, we briefly describe some of these issues and their origins. Next, we introduce a practical team-oriented data modeling approach that allows you both to document how your business actually works now and to build a solid, easily extensible foundation for the processes and applications that will make it work the way you need it to in the future. Finally, we outline how this approach could be at the heart of a new kind of organization that eliminates the traditional business-IT divide. Instead of creating models that mirror the existing organization and its dysfunctions, we suggest first creating the model and then deriving the organizational structure from it.

Silos Upon Silos

Organizations usually have some understanding of the concepts that drive their business. But the entities used in their IT systems often don’t align with these concepts. Furthermore, their employees are divided into departments and teams in a way that doesn’t match the concepts either.

This disconnect leads to a vicious cycle of people silos creating data silos creating more people silos.1 Leaders of different business units don’t talk to each other and create data fiefdoms out of organizational fiefdoms. They fail to recognize that most of the undertakings they see as IT projects actually are business transformation programs. It’s not enough to throw money and some requirements over the business-IT fence and then wait for the magic to happen. All stakeholders, business and IT alike, must be aligned to successfully complete the transformation.

Silos enforce the idea that data behaves much like a traditional manufacturing resource that needs to be physically present at the location of use. But data is different. At least in theory, it is a nonconsumable resource, instantaneously available everywhere and perpetually reusable. Unfortunately, organizations are often unable to leverage the unique characteristics of data. The process of continuous siloization found in many organizations, and in almost all organizations above a certain size, means that the data that employees need to do their work is distributed over a large number of nonintegrated IT systems within different intracompany jurisdictions.

Often, it is hard for people to find the data they think they need or to even know that the data is already there — in some system, somewhere — in another part of the organization. In other cases, multiple contradictory sources exist for a certain data element, without any clear indication as to which one holds the correct values. The result is that data about a single topic has to be stitched together manually from different systems with different data structures or with the help of large quantities of what is euphemistically called “business logic.” With all these silos, compliance with the GDPR and similar data privacy regulations is close to impossible.

While it is difficult to quantify precisely the impact of the time people spend seeking, compiling, and harmonizing data in their daily work, it’s obvious to most organizations that the cost, duration, and overall com­plexity of what they see as IT projects continue to rise. While it’s easy to blame the IT department’s general incompetence for these issues, siloization is usually the root cause of the problem.

However, IT people are not totally innocent bystanders. Too often, they seem reluctant to work with the business side to determine the actual challenges the organization faces and instead keep to themselves, building system after system after system without really knowing whether these systems are what the users need. Sometimes, they even try to impose their preferred tool, or preferred vendor, regardless of the actual requirements, just because that’s what they know and like.

Building all these siloed systems, organizations accumulate what Dave McComb calls “integration debt”:

Integration debt occurs when we take on a new project that, by its existence, is likely to lead someone at some later point to incur additional work to integrate it with the rest of the enterprise.2

This kind of debt is what tends to make each IT project more complex and more expensive than the last. The number of nonaligned IT systems inside an organization keeps increasing, making it more and more difficult to stitch them all together without the whole thing falling apart.

Modeling Your Reality, Together

How can an organization escape this vicious cycle? After spending years on various kinds of data-related projects and having countless, and often depressingly similar, discussions with practitioners, we’ve come to the realization that the solution could be to put data modeling front and center. It just has to be done a bit differently than it has been done in the past. The new way we’re about to describe won’t have you bottlenecked by building complex enterprise data models in an ivory-tower fashion before implementing anything; far from it.

First, we have to realize that data models only become really useful when they also work as communication tools, documenting with sufficient detail how an organization works now and how it will work in the future. When a model captures both the current reality and the desired reality, it can serve as the core of a model-driven architecture, with all kinds of technical and nontechnical artifacts generated from it. In the process of creating such a model, the people in the organization develop what Eric Evans calls a “ubiquitous language,”3 a common vocabulary that makes sure that everyone understands what everyone else in the organization is talking about.

This common language is the first prerequisite for escaping the vicious cycle of siloization. With its help, an organization can overcome the Tower of Babel–like confusion caused by silo-specific dialects that use different words for the same thing or, even worse, the same word for different things. The second prerequisite is to stop seeing a data model as purely technical and specific to an application, such as something that describes the database of a particular system. Instead, think of a data model as a description of what actually happens in the organization, a model that is shared between applications.

How can you build such a model? Do not fixate on generic concepts early on, a trap people within IT often fall into, before you have a grasp of the individual cogs that make your organization tick. Start with interesting instances that exemplify your daily business, whatever it is. Don’t talk about parties involved in a transaction at a location, talk about “Dominique buying a red T-shirt from Alberto at the Poughkeepsie store.” Flesh out the instances with details such as age, marital status, or physical characteristics; think about the identifiers you might use to recognize the people involved. Think persona, not generic concept. Only after you have collected enough instances to cover the relevant parts of your business should you start thinking about how to classify them.

Finding instances and determining classifications isn’t something enterprise architects or database developers should do on their own. On the contrary, it is a collaborative exercise involving stakeholders from both sides of the business-IT divide — a divide that we hope gets smaller in the process. You need to involve all relevant actors to make sure that the agreed-upon classifications really are common classifications, acceptable to every­one who works with whatever has been classified. Only then will you arrive at a data model that is specific to your organization’s needs and supported by a broad consensus within the organization, not just a model that serves as the foundation for yet another data silo.

Keep in mind that these common classifications may change over time. Your organization and the way it conducts its business is evolving, and instances and classifications must evolve, too. By regularly working on instances and classifications together, the business and IT sides of your organization will make sure your ubiquitous language stays up to date and, in the process, will gain a better understanding of each other and the state of the organization.

Generating the Next Levels

Many data initiatives put too much effort into creativity at the technical implementation level. Instead, the focus should be firmly on the business-minded discussion — first, of instances and, then, of classifications, as described above. All technical artifacts should be generated with minimal human involvement, freeing up time and energy for whatever is the organization’s core business. Think data warehouse automation, but for all kinds of physical data stores instead of only for data warehousing.

Remember the three levels of data modeling: conceptual model, logical design, and physical implementation? The interesting instances and common classifications found in the collaborative exercise described above form a conceptual model from which a highly normalized logical design can be generated automatically that includes all the relevant:

  • Concepts — the things that are important for the people involved

  • Connections — the relationships between these things

  • Details — the attributes of these things

This logical design holds everything together. It serves as the foundation for all the physical data stores that might be needed, making sure all of them speak the same ubiquitous language and are therefore interoperable with each other. Again, these physical data stores don’t have to be painstakingly modeled by hand; they can be generated without much effort from the logical design using mathematical representations of standard modeling patterns, such as fifth normal form, data vault, or anchor.4

Deriving data stores from an organization-wide logical design contrasts with the traditional method of creating a dedicated model for each and every application. Most existing organizations are, in that respect, application-centric. McComb, however, defines the data-centric organization as “one in which most application functionality is implemented on a single, simple, update-able, shared, federate-able data model.”5 Given that definition, data-centric actually might be a bit of a misnomer. More than merely being centered on data, the organization is driven by a single model. Just how far we can take this somewhat radical thinking will be easier to understand if we first take a stroll down memory lane.

From Many Models to One

A quick IT history lesson will help us understand the current situation and the related paradigm shift about to happen. When computers were largely unconnected, it was necessary for data, interfaces, and logic to reside alongside each other. The purpose of the interfaces and the logic was to fetch, display, modify, and create data, usually with a human involved in the process. This meant that even within a single computer running one program, it made sense to separate concerns among these three elements. During the 1970s, such ideas were even formalized in the programming language Smalltalk-79 under the acronym MVC (Model-View-Controller), a design pattern that lives on in most modern programming languages.

With the widespread use of this pattern, it is somewhat perplexing that in a time when it is hard to say where one computer ends and another begins, or if they are local or in the cloud, the way we think about applications has changed very little. Applications are mostly the same monoliths as they were back in the 1970s, keeping the same single-computer architecture, but now with a virtual machine on top of any number of elastically assigned physical units. We have already gone from the computer as a physical asset to compute as a virtual resource, but this journey is now also beginning for data. Thanks to data being able to flow freely, applications can be built so that it’s also difficult to say where one application ends and another begins, and applications may seamlessly run from the edge to the cloud.

Centralizing data is, of course, not an entirely new thought. Centralization has been conceptualized through master data management (MDM), in which the responsibility for a few important organizational concepts is taken away from the zoo of applications where it usually resides. While many MDM initiatives have began over the last decade, most of them haven’t added much value. We believe that from both a technological and a theoretical perspective, MDM may have been released into the wild prematurely. It requires proprietary languages to communicate, is slow to adapt to changes, and demands complete consensus on how to define the data for which it is responsible. Since initiatives centralizing just one concept have had little success, centralizing all of them might sound impossible. But this is one of the problems where holistic thinking actually will turn the odds in your favor.

Because no organization can function with only a single application, good standards on how to exchange information electronically now exist, largely thanks to so-called service-oriented architectures. It is no longer necessary to build data models to last; they can be built for change and can be malleable with respect to differ­ences in opinion about definitions.6 Finally, the mean­ing of “master” has been made operationalizable: a master is responsible for the identification process, via which it is possible to determine, given some pieces of information and through smart comparisons with stored data, whether something has been seen before or whether it is something new.

We are, therefore, now better equipped than ever before to go from application-centric to model-driven. There are no longer technical limitations preventing applications and the people working with them from speaking a common, ubiquitous language throughout an organ­ization. When things change, the terminology used and the model with which people work can change with it. Probably the biggest challenge will be to get application vendors to release control of the data they have intern­alized. For many, this will involve major rewrites of their software, which doesn’t come easy or cheap. Will vendors that embrace a model-driven architecture gain enough traction to force everyone to follow suit? Only if the buyers — you and the organizations you work for — start to demand it and enforce it once in place.

From Model-Driven Architecture to Model-Driven Organizations

Even when an organization works with centrally iden­tified pieces of information that fit into an organization-wide logical design, one important misalignment remains: the one between the general understanding of the concepts that drive the business model and the departments and teams into which the employees have been divided. To bridge this final divide, the organization has to make the leap from a model-driven data architecture to a model-driven organization; it has to derive its own structure from the logical design.

In a truly model-driven organization, the organizational chart is just another physical implementation of the common logical design. People will work together in small, cross-functional teams that are one-to-one with the concepts that are important to the organization right now. If, for example, your important concepts are Customer, Employee, Product, and Sale, then you’ll have teams called Customer, Employee, Product, and Sale that are responsible for the respective concept, its details, and the physical data store(s) associated with it. The common business-IT divide will slowly become obsolete because, to fulfill all its responsibilities, each team will have to include both more business-minded and more technical-minded people. At the same time, the one-to-one relationship between concepts and models will prevent the reemergence of different understandings of the same concept in different parts of the organization.

Of course, none of these teams would or should be an island. There will be defined interfaces between the teams that are one-to-one with the connections from the logical design. Teams are jointly responsible for their common connections and the physical data store(s) associated with them, usually with one team in the lead. In our example, there will be a connection between Customer, Employee, and Sale, and another connection between the Sale and the Products that have been sold. In both cases, it makes sense that the Sale team takes the lead because Sale is the concept that ties all these other concepts together. These institutionalized connections will make sure that no team can isolate itself from the others and degrade into one of the people silos of old.

Everything put together, the modus operandi of such a model-driven organization would consist of these four steps repeated over and over again:

  1. Regularly document representative examples of what happens in your organization (the interesting instances) to see whether your model still aligns with how you actually conduct (or want to conduct) your business.

  2. If there are significant discrepancies between the model and the reality you want and you have to make structural changes to the organization, build consensus to alter the relevant classifications accordingly.

  3. Generate the new logical design of your organization from the altered conceptual model (the current combination of interesting instances and common classifications).

  4. Generate the new organizational chart and changes in the physical data stores from the altered logical design.

An organization that follows this process will have close alignment between the general understanding of the concepts that drive its business, the entities used in its IT systems, and the teams into which its employees are currently divided. Because physical data stores and organizational structure are generated from a common model, they can’t stray too far from the business concepts. If discrepancies evolve over time, they are detected early, and the model, data stores, and organizational structure can be adapted easily.

While a traditional organization’s overarching purpose may be well known to its workforce, that purpose is usually hard to translate into the rationales behind individuals’ work and the particulars of daily operations. In a model-driven organization, everyone has a crystal-clear purpose; people’s actions serve the purpose of fetching, modifying, and creating data about a specific concept. If their actions do not leave digital traces in the right places, they are not doing what they are supposed to be doing. In such an organization, it’s clear where to find customer data because all customer data can be found in one place, shepherded by the Customer team. While there isn’t one authority for everything, the team that is responsible for a concept is the one authority for everything related to that concept.

An organization will no longer have the single source of the truth, which it probably never had anyway, but it will have a single source of the truth for data about each concept that is relevant to the organization, and it will be obvious what this source is. This architecture not only helps the people working inside the organization to find the data they need and update it in the right and only place, it also helps people from the outside interacting with them. For example, regulatory authorities may require data from the organization, or customers may want to see, and sometimes change, the data the organization has about them (GDPR data subject requests may well make this type of request commonplace). In a model-driven organization, employees will be able to fulfill such requests faster, more easily, and more accurately.

Conclusion

Whatever you may think of its feasibility with respect to your organization, model-driven architecture is now something every architect must consider. It has been done in practice at companies such as Hitachi, Google, and Bitly, with very promising results, and can be done at other organizations. Ravi Ganesan, a proponent of data-centricity, explains that “one powerful way to become a high-performing organization is to bend the power of technology to support organizational performance.”7 Since every organization now uses technology to deal with data, just having lots of technology for handling data is no longer a differentiator. What is a differentiator is to what extent the organization leverages this data to advance its business. Having a model-driven architecture and becoming a model-driven organization are powerful approaches to successful data leverage. If your organization won’t take the leap, it risks being overtaken by organizations that will.

References

1Silverston, Len. “Zen and the Art of Data Maintenance: People Silos Cause Data Silos.” The Data Administration Newsletter, 6 November 2019.

2McComb, Dave. “The Data-Centric Revolution: Integration Debt.” The Data Administration Newsletter, 1 March 2017.

3Evans, Eric. “Domain-Driven Design: Tackling Complexity in the Heart of Software.” Addison-Wesley Professional, 2003.

4Kaul, Christian. “From E-R to Data Vault: Transforming Models with Shape Functions.” LinkedIn, 5 August 2018.  

5McComb, Dave. “Data-Centric: Models and Architectures.” Data Architecture Summit, 15 October 2019.

6Rönnbäck, Lars. “Modeling Conflicting, Uncertain, and Varying Information.” ResearchGate, 2018.

7Ganesan, Ravi. “Utilizing Technology to Build a High-Performance Organization in a Value-Based Environment.” The Open Minds Performance Management Institute, 14 February 2019.

About The Author
Christian Kaul
Christian Kaul is a data modeler, writer, and event organizer based in Munich, Germany, who focuses on designing, implementing, and maintaining data warehouses. He has several years’ business intelligence experience in various industries, including healthcare, insurance, tourism, and telecommunications. Mr. Kaul’s project roles have included data modeler, data warehouse developer, project manager, and support team lead. With a keen interest in… Read More
Lars Rönnbäck
Lars Rönnbäck is a researcher in divergent information at Stockholm University, Sweden, and a consultant in information modeling. He has been working with some of the largest organizations in Sweden and their challenges as far back as Y2K. Over the last decade, Mr. Rönnbäck’s research has focused on how to manage information that is uncertain, imprecise, conflicting, and varying over time. This research has resulted in new modeling techniques… Read More