Often, when there’s a new term being thrown about, we assume that we’re talking about a piece of technology. But “soft” issues like culture and organizational structure are just as important — sometimes even more important — than the technology underpinning an engineering team’s workflow.
In the case of data mesh, the term that sounds quite like a “service mesh” is actually very different. Data mesh is not a type of technology, and one does not buy a data mesh or hire a data mesh provider. It is a way to describe an organizational structure.
“Data mesh is just an organizational principle,” said Arsalan Tavakoli, senior vice president of field engineering at data management systems provider Databricks. In the past, there’s been a general organizational structure in which a centralized data engineering and data science team does all kinds of work collecting and cleaning data and then making it available for the business. Data mesh is a way to decentralize not necessarily how the technology for how data is stored, but the teams that are accessing it.
“The people who have the most context on what data sets are valuable and what you actually need are the business units,” Tavakoli said. “Why wouldn’t we ultimately say, hey, sales, supply chain or any business group, you should be responsible yourself for cleaning the data, making your data assets available to others and powering your business use cases.”
Connecting Data with Domain Experts
“The first pillar is business domain-oriented ownership of the data,” explained Zhamak Dehghani, director of emerging technologies at IT consultancy Thoughtworks. This can have implications for the data architecture, but the key idea is that the way data is structured and stored is to support the goal of embedding data engineers and data scientists in business units, so that those business units can use their domain knowledge to get value out of the data.
Historically, enterprises stored their data in enterprise data warehouses or data lakes, and the people who used those technologies were highly specialized in the technology — but they often didn’t understand the domain data. In addition, the centralized data lakes were designed for general use, and often didn’t meet the needs of all teams.
“If you have a central data engineering group, how well do they really understand what are the data sets that finance needs? Or the data sets that any of the business units needs?” Tavakoli said. “The closer you are to somebody who understands the business problems and the requirements and has the domain knowledge, the better prepared they are to build the right set of data assets to power the right kind of use cases.”
This also created a bottleneck in connecting data with the people who knew how to unlock that data’s value. The data engineers and scientists didn’t understand why the head of supply chain wanted particular data sets or what they hoped to learn from it; the head of supply chain didn’t have the technical skills to manipulate the data or decide how to clean the data. In the end, this means the business owner keeps coming to the data engineering group and complaining that the data product isn’t built correctly.
If data mesh is all about giving business units the ability to control their own data science projects, it’s also about putting them in the drivers’ seat when it comes to building analytics capabilities and being able to share those analytics with other teams.
“A product team, might have been building a trading desk, to use an example from my experience,” explained Andrew Stevenson, chief technology officer at DataOps platform lenses.io. “The analysis of the trades would then typically be handed off to a second team, that would typically be trying to manage lots of tenants at the same time. There would be a bottleneck.” Data mesh involves allowing that product team to self-serve and create their own analytics that are not only for use inside the team but that can then be shared with other teams.
The ability to share data between business units, even if individual business units are making their own data engineering decisions internally, is critical to data mesh structures, which also means that central governance policies are still important, Dehghani said. “Interoperability, pieces of metadata being interoperable, that needs to be enforced into the platform as an automated capability.”
So, if we’re going to embed data engineers inside business units, does that mean that the underlying technology needs to change? Not necessarily.
“There’s not necessarily a technology change as such,” Stevenson said, about moving to a data mesh organizational structure. Of course, data technology is constantly evolving — there’s a continuing move towards more real-time data streaming, among other things — but Stevenson says the technology is fundamentally still the same. “You’ll still have an enterprise data warehouse, you may still have a data lake in there, you may still have relational databases as well,” he said.
According to Dehghani, there are some technology changes, mostly related to facilitating the kind of self-service that will allow business owners to directly interact with their data assets. “How can we technically make it possible for autonomous teams to provide data as a product, to run queries across a mesh of interconnected data products, and let them build data products with minimum friction,” she said.
Data mesh isn’t going to be possible for everyone — and even in some organizations, it’s likely that some high-priority business units will get some data engineers, while others have to make do with the centralized team. “The reason most people have a central data team is that some of these talents like data engineers and data scientists are incredibly scarce,” Tavakoli said. “For smaller organizations, or any organization that struggles to hire for these roles, a data mesh will likely lead to incomplete data and reverting back to a more centralized solution.”
It’s possible, at some time in the future, that some data engineers won’t just be “data engineers” but “finance data engineers.” Tavakoli said that we’re still very early on this journey, but it does seem to be the direction data mesh is taking.