What Is a Data Mesh?
A data mesh is an architectural concept in data engineering that gives business domains (divisions/departments) within a large organization ownership of the data they produce. The centralized data management team then becomes the organization’s data governance team.
In a traditional data architecture, the centralized data management team is responsible for all data-related activities. The problem is that when every data-related concern requires input from the same people, it creates a bottleneck.
Distributing the operational responsibilities for data management to individual domains within the organization eliminates this bottleneck and allows the centralized data team to focus their attention on governance, risk, and compliance (GRC).
The idea of decentralizing data ownership and responsibilities and distributing them across domain teams within an organization is credited to Zhamak Dehghani.
Just as edge computing processes data as close to the source as possible, Dehghani saw the need to bring data management as close to the source as possible. Since its introduction, the concept of data mesh architectures has steadily gained traction and is continually evolving to take the practical challenges of its implementation into account.
How a Data Mesh Architecture Works
In this architecture, data-producing domains within the organization are responsible for documenting semantic definitions, cataloging metadata, and setting usage policies. Each domain is required to set up its own data pipeline to clean, filter, and load its own data products, and they are responsible for its data quality, discovery, security, and privacy.
The centralized data governance team is responsible for creating, managing, and enforcing standards for creating, storing, accessing, and using data.
Who Should Implement a Data Mesh Architecture?
Data mesh architectures are generally adopted by large and complex organizations that generate and consume a significant amount of big data from diverse business domains such as sales, marketing, customer service, and finance.
The ultimate goal of moving to a data mesh architecture is to allow the organization to establish a self-serve big data platform that supports continuous change, provides scalability, and autonomously meets the business needs of all stakeholders.
Candidates for data mesh adoption include:
- Enterprise-level technology companies;
- Banks, insurance companies, and other financial institutions;
- Large healthcare providers;
- Large retail e-commerce vendors;
- Telecommunications companies;
- Government agencies.
Implementing a data mesh is a significant undertaking that involves changes to technology, as well as disruptive changes to organizational culture and workflow processes.
It may not be the right choice for smaller organizations with less complex data management needs.
Why Data Meshes Are Disruptive
The concept of a data mesh requires a sea change within the organization. For a data mesh implementation to be successful, every business division needs to think about the data they create as the organization’s actual product – and not just a by-product of each domain’s work.
The other domains are the organization’s customers, and best practices for creating a good customer experience include ensuring that each data product:
- Follows standard naming conventions, syntax, and semantics;
- Registers with the organization’s data catalog for easy discoverability;
- Has a unique address to help consumers access the data product programmatically;
- Allows internal data consumers to seek data access directly from domain data owners;
- Has a service-level agreement (SLA) that supports the organization’s data usage needs.
Challenges and Considerations
Implementing a data mesh architecture is not without challenges, however. Moving to a data mesh architecture requires careful planning, a clear understanding of the organizational needs, a good communication plan, and a willingness on the part of all employees to deal with the organizational change management issues that are bound to occur.
The move requires adopting a new language and a set of values that prioritize data discovery and usage, real-time data processing, and distributed data ownership.
Although this architecture is meant to improve data quality by giving data ownership to the domains that actually create the data and understand its context, it can do the opposite if the centralized data management team’s GRC standards and policies are not clearly defined or enforced reliably and consistently.
The Role of AI in Data Mesh Architectures
Artificial intelligence (AI) plays an important role in the execution and operation of a data mesh architecture. It is used to automate many tasks that would be difficult or impossible to manage manually and help ensure the data the mesh contains can be used effectively to generate insights and make data-driven business decisions.
- Automating data quality checks and identifying potential data integrity issues;
- Automating the process of cataloging data and managing metadata to keep track of what data is available, where it resides, who owns it, and how it can be accessed;
- Recommend relevant datasets to data consumers based on their past usage, the tasks they are trying to accomplish, or the semantic similarities between datasets;
- Automate the data preparation process by identifying relevant features, carrying out necessary transformations, and dealing with missing or inconsistent data;
- Assisting the centralized data governance team by monitoring domain compliance with data privacy regulations and security policies;
- Help domain teams make sense of the large amounts of data they generate by finding patterns, generating insights, and making predictions.