We live in the age of data. Companies collect it in nearly every aspect of their business, and use it for business intelligence, logistics, monitoring, R&D, and many other activities. Used well, data drives innovation. Used poorly, it can be a distraction, if not a liability. One way to ensure that your company takes full advantage of its data is via effective management.
Data mesh organizes data based on decentralized domains. Groups “own” their data, and are responsible for making it available to the rest of the company. It treats data as a product, with teams providing it to internal clients with a self-service model.
Let’s look at how data mesh can help you organize your most valuable asset.
What is Data Mesh?
Data mesh is a platform architecture that manages the abundance of data in an enterprise with a self-service and domain focused design. It was originally defined by Zhamak Dehgani and applies domain-driven design to data architecture.
Data Mesh is based on four design principles. Let’s go over each one.
Instead of warehousing data in centralized stores that are managed by shared resources, data mesh leaves data with the teams best suited to manage it. This is where data mesh’s extension of domain driven design begins.
Data mesh organizes data around the business units that assemble or create it. Companies already organize themselves by domain. So, domain makes sense as a method for deciding who is responsible for managing data, because domain owners understand their data sets best.
Data As a Product
Data has quickly become a valuable asset, and there’s a thriving market for it. This data mesh principle brings this paradigm into the enterprise. It makes the domain team responsible for applying a product mindset to making their data available to themselves and the rest of the organization.
A product mindset makes data easier to find, understand, trust, and use. Dehghani’s original description of data mesh specifies how teams make this happen. They make their data: discoverable, addressable, trustworthy, self-describing, inter-operable, and secure.
Self-Serve Data Platform
Having each team build an API to share their data is a great way to share data inside the enterprise, but you still want cohesion between these services. That’s where self-serve platform comes in. Data mesh has a platform team that builds tools and infrastructure that sits above the domains in the stack and makes the services work well together. So, instead of building a centralized team that needs to be familiar with every data set, it’s a team that focuses on maintaining and building generic tools.
Federated governance is the final principle. It maintains the delicate balance between data silos and a monolithic corporate architecture. Without a notion of governance, the individual data products won’t work well together, and internal developers will struggle to write unified applications.
With federated governance, the platform embraces standards, rules, and industry regulations without centralized control.
Data Mesh vs Data Fabric
Data mesh and data fabric both aim to solve the same problem; managing data in a diversified environment, but they approach it in very different ways.
A data fabric approaches the issue by building a unifying virtual management layer atop the various services. It attempts to unify management with a metadata-driven approach.
A data mesh takes an API-driven approach, with a set of federated services that operate independently, but handle requests with common rules.
What Is the Difference between Data Lake and Data Mesh?
Data lakes and data meshes are different architecture, and they’re designed to solve distinct problems. A data lake is a centralized store for assorted types of data.
While deep lakes have both structured and unstructured data, they lack the overall structure of a data mesh.
So, a data lake is a type of data store, while a data mesh is a platform for organizing more than one data service. They aren’t mutually exclusive. A data mesh may contain one or more data lakes.
Why Use Data Mesh?
Different Data Management Approaches
As your data sources multiply, you need a way to manage and distribute them. There are several approaches that you could take. Let’s look at three of them.
- Centralized control – bring all the data sources under a “single roof” like a data lake or data warehouse.
- Centralized model – keep the data sources separate, but unify them with a virtual layer, as described above with a data fabric.
- Federated model (data mesh) – keep data sources separate and under the control of the groups that created them. Unify them under shared standards.
Each of these approaches has its advantages and disadvantages.
Different Approaches Compared
Centralized control gives you a single point of access with a single owner. Depending on the approach, it has a single access paradigm and consistent security controls. Some companies prefer this model, but it has several significant disadvantages.
Is it the best approach for all your data sets? Will some suffer under a one-size-fits-all data model? Ingesting the data sets and normalizing them for the centralized system may slow your systems down. You’ll also need to ensure that your warehouse staff have the expertise to handle every data set. Who’s responsible for supplying the funding and staff when a department wants to add a new data set? What will the lead time be?
A centralized model gives your teams a unified data interface, but leaves control over the services with the teams that created them. So, this could be the best of both worlds. Or it could be the worst. While individual teams manage their services, access is still routed through a centralized system, leaving you with many of the disadvantages of centralized control.
Data mesh is the federated model. As we covered before, this means leaving data sets with their creators, and making them responsible for providing access to other teams. So, the experts who created your services continue to run them. They keep them running, keep the data fresh, and design the APIs clients use to access them. So, the data sets are maintained by the domain experts that need them most.
If a team needs to build an additional source, the funding, systems, and staff are their responsibility. Instead of waiting for a centralized team to raise the funds or staff up for a new project, the team that needs the new data can either make the investment themselves, or not.
Why Not Use Data Mesh?
While it’s a powerful solution for managing large data platforms, it’s not the right choice for every organization.
Your Domain Teams Aren’t Ready
Data mesh shifts responsibility for managing data from central groups to domain teams. If those teams aren’t ready, your platform shift will fail.
Acquiring data, designing an API, and building service are significant projects that require considerable domain knowledge and technical skills, but the responsibility doesn’t stop there. Domain teams have to handle ongoing maintenance, too. Data changes, and building a platform with data mesh architecture isn’t a project, it’s an ongoing effort. They’re also responsible for security, privacy, and compliance.
So, are your domain team ready to tackle these tasks? Do they have the personnel they’ll need?
Not Enough Data to Scale
Data mesh architecture reduces or eliminates bottlenecks associated with delivering data via centralized teams. But you’ll only realize that goal if you have enough data sources and domain teams to make moving to a mesh worthwhile.
You Lack the Organizational Structure
Data mesh requires a commitment to data governance. So, while the enterprise defines the overall structure of data, domain teams need to make tactical decisions about their data sources and services. If they can’t, the platform won’t scale.
Is your organization prepared to cede this authority over the data to the teams? Is it willing and able to provide the teams with the resources and the authority they need?
Data Mesh Use Cases
Data mesh works well when data producers need access to their data quickly and also need to share them with other teams.
Some common situations are:
- Real-time analytics – analytics often span multiple sources where ingesting and normalizing them into a central database would slow the process down. With a data mesh, you can store the data in individual sources and perform analytics via APIs.
- IoT analytics – similar to real-time, IoT data spans a variety of devices with differing data sets.
- Business/Customer intelligence – customer data often spans multiple markets with varying sources and dictionaries
- Logistics – like customer data, logistics data often spans more than one domain.
Data Democracy with Data Mesh
In this post, we covered what you need to know about data mesh. We covered its core principles, and how those principles help you create a federated enterprise data architecture. With data mesh, data sets remain in the hands of the experts, and they’re shared with APIs designed by the same people.
Data mesh is a popular data architecture because of the way it democratizes data and leave it in the hands of the people that need it most.
This post was written by Eric Goebelbecker. Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).