A General Recipe for Creating Data Architectures

Posted February 24, 2021 | Technology |
A General Recipe for Creating Data Architectures

When compared with software architecture, data architecture is relatively new. It has come into focus in recent years due to the rise of big data and machine learning. A robust data architecture that allows for change likewise allows for future-proofing, as an initial set of use cases that may be limited to reporting and auditing can then be expanded into predictive analytics, recommendation engines, and assistive artificial intelligence (AI). This typi­cally requires much richer and wider data to power the algorithms, including access to metadata, transactional data, streaming data, and so on. Such data architectures can evolve to deliver that.

The role of data architects is sometimes vaguely defined and tends to fall on the shoulders of senior business analysts, data scientists, or database and ETL specialists. As with any kind of architecture, designing for uncertainty is a key requirement with data architecture. An organization’s data sources and data requirements will always be in flux because organizations will always be undergoing significant changes such as acquisitions, digital transformation programs, or development of new services and products.

Conceptual-Level Data Architecture Design

To start, you will want to build a data blueprint at the enterprise level by designing the data entities and taxonomies that represent each business domain, as well as the data flow underneath the business process. Ensure that you capture:

  • Core data entities and data elements about products, clients, and services

  • Source data you have, both internal and external, that you can leverage to create outputs

  • Output data needed by the organization (and whether it can be created using the source data available)

  • Relationships between different data entities (reference integrity, business rules, execution sequence)

  • Security policies to be applied to each data entity

Logical-Level Data Architecture Design

This is the data modeling aspect of data architecture, and it should bridge the business requirements to the underlying data management systems (data stores, data pipelines). The goal of architecture here is again not to impose strict rules, but rather to create strong guardrails that allow for efficient use of data and managed change.

Our experience shows that six key areas make or break data projects:

  1. Naming conventions. Naming conventions are a key — and often misunderstood — element in data modeling efforts. There is power in names. Good, clear, and consistent names enable us humans to more easily understand the complex data. Names should be applied consistently across data.

  2. Data integrity. Integrity rules need to apply consistently across all of the data. This is of special importance if the same data resides in multiple data sets.

  3. Security and privacy. These are now key aspects of all database design not only due to the risk and costs of data leaks, but also due to the strong regulatory environment organizations have to operate in (GDPR, CCPA, HIPAA, etc.).

  4. Data pipelines. Data movement and transformations between applications, systems, and databases should be clearly defined at this level.

  5. Data replication. With the constant performance gains of storage and its ever-decreasing costs, data replication is used to solve three key challenges: high availability, performance (avoiding network data transfer), and decoupling of downstream workloads that make use of the data. However, too much data replication will lead to poor data quality and inefficiencies. Consider these tradeoffs carefully and make sure to apply your guiding principles.

  6. Data archival and retention policies. It’s important to define these during this stage as well. We have seen numerous projects where archival and retention policies were afterthoughts well into production. This led to the wasting of resources to troubleshoot the “unexpected” problem, inconsistent data across different data stores, and poor performance of queries.

[For more from the authors on this topic, see “Designing Emerging, Adaptive Digital & Data Architectures.”]

About The Author
Olivier Pilot is a Senior Consultant with Cutter Consortium and a Senior Architect with Arthur D. Little’s UK Digital Problem Solving practice. He has broad experience across a range of projects involving enterprise and solution architecture. Mr. Pilot's focus areas include digital strategy, Agile digital solution delivery, design and architecture, and design thinking innovation. His recent sample engagements include the design and delivery of a… Read More
Michael Papadopoulos is a Senior Consultant with Cutter Consortium, Chief Architect of Arthur D. Little’s (ADL's) UK Digital Problem Solving practice, and a member of ADL's AMP open consulting network. He is passionate about designing the right solutions using smart-stitching approaches, even when elegance and architectural purity are overshadowed by practicality. Mr. Papadopoulos leads the scaling of multidisciplinary organizations by focusing… Read More
Michael Eiden is a Senior Consultant with Cutter Consortium. Dr. Eiden, who serves as Head of AI at Arthur D. Little, is an expert in machine learning (ML) and artificial intelligence (AI) with more than 15 years' experience across different industrial sectors. He has designed, implemented, and productionized ML/AI solutions for applications in medical diagnostics, pharma, biodefense, and consumer electronics. Dr. Eiden brings along deep… Read More