Data, often compared to the “oil of the 21st century”, defines our digital age. The emergence of big data in the 2010s, combined with the development of the cloud and artificial intelligence, has transformed our technological landscape. However, centralized structures have not been able to fully exploit this potential. Born around 2018, the concept of Data Mesh promises to revolutionize the way we manage and exploit data. Discover this essential paradigm through the exploration of its birth, its principles, its advantages and disadvantages, and concrete examples such as Netflix, BP, and Vistaprint. Essential reading for anyone wishing to understand the future of data management.
- What is Data Mesh?
- Reasons for Data Mesh
- Definition of Data Mesh
- History of Data Mesh
- Principles of Data Mesh
- Advantages of the Data Mesh
- Disadvantages of the Data Mesh
- When Data Mesh is not for you
- Data Mesh in the real world
- Data Mesh Architecture
- Going Further
What is Data Mesh?
Reasons for Data Mesh
Over the past few decades, companies have faced an exponential increase in the variety, volume, and velocity of data – the 3Vs. To respond, we have moved from data warehouse to data lake. This transition was primarily a technical response to the challenges posed by the 3Vs, but it is important to note that data governance has remained centralized.
Take the example of the United States, a federal country where the decentralization of powers seems to have favored great economic prosperity. Moreover, the fact that the USA has one of the oldest constitutions in the world tends to demonstrate the stability of the federal system. Conversely, the USSR, where all decisions were made in Moscow, did not experience the same boom. This contrast favors decentralization.
Moreover, imagine if the Internet had been built around a central computer. CompuServe, a pioneer in online services, followed this model, but it was supplanted by the decentralized architecture of the Internet, which was essential for its global success.
Similarly, in business, this centralization of data is limiting. Indeed, it can hinder innovation, limit access to data, and slow responses to the constantly evolving needs of the different functional areas of the company.
To solve these problems, a new approach was born: the Data Mesh. This initiative decentralizes data management while strengthening collaboration among different actors. The word “mesh”, evoking the link of a decentralized but interconnected network, symbolizes this new philosophy.
Definition of Data Mesh
Data Mesh is a decentralized method of data management. Each business domain of the company is responsible for its data, integrating multidisciplinary teams combining business and technical skills. Data is considered a product, accessible in self-service by other domains.
Data governance is federal. Each domain has functional and technical freedom, framed by common standards. These standards promote collaboration and overall consistency, while remaining minimalist to stimulate agility and innovation.
History of Data Mesh
Although the decentralization of data management may have existed before Big Data, the statement of the Data Mesh concept is clearly attributed to computer scientist Zhamak Dehghani.
Birth of Data Mesh at Thoughtworks in 2018
It was around 2018, while she was Director of New Technology Incubation at Thoughtworks, an American digital services company, that Zhamak Dehghani founded the concept of Data Mesh.
With over 20 years of experience as an expert in software architectures and a graduate of Shahid Beheshti University in Iran and the University of Sydney, she has also written books on Data Mesh, published by O’Reilly.
A renowned speaker, she eventually left Thoughtworks to create Nextdata, her own company.
Popularization on Martin Fowler’s Blog in 2019
Zhamak Dehghani exposed the concept of Data Mesh in 2019 in her article titled How to move from a monolithic data lake to a distributed Data Mesh on Martin Fowler’s site; the latter also being a key figure at Thoughtworks.
Remember that Martin Fowler is a legend in software development, author and speaker, specialized in object-oriented design, UML and agile development. He has written nine books on software development.
Inspired by Eric Evans’ Domain-Driven Design (DDD)
The concept is mainly inspired by Eric Evans’ domain-driven design.
Evans is the author of Domain-Driven Design: Tackling Complexity in Software, published in 2004. He has worked on large enterprise system development projects since the 1990s and currently leads the consulting group Domain Language. DDD has strongly influenced the philosophy of Data Mesh.
Data Mesh also draws ideas from the team topologies theory of Manuel Pais and Matthew Skelton. This theory aligns with the decentralized approach of Data Mesh, contributing to the emphasis on responsibility, interoperability, and collaboration within teams in data organization.
Principles of Data Mesh
Data Mesh is based on four principles that we expose below using the example of a supermarket, where the products on the shelf are data:
Decentralized Domain Ownership of Data
In Data Mesh, each functional domain is accountable (data ownership) and responsible (data stewardship) for its own data.
This can be illustrated by comparing it to a supermarket where each section represents a different domain:
|Label||organic, fair trade, non-GMO|
|Expiration Date||each product has an expiration date|
|Origin||place of harvest or manufacture|
|Type||tools, materials, hardware|
|Brand||each tool has a brand|
|Technical Specifications||size, weight, material|
Thus, each section is the most competent to manage its own products: each domain is best suited to handle its own data.
Data as a Product
In the Data Mesh, data is considered as a product. In our supermarket, each piece of data would therefore be a product on a shelf, ready to be discovered, accessed, and used by consumers. To facilitate this, data products must at least have the following properties:
- Discoverable: each data product must be listed in a centralized data catalog, much like large hypermarkets where touch screen terminals allow customers to easily locate the aisle of a product they are looking for.
- Addressable: each data product must have a unique address, like a barcode or a unique location for each product in a supermarket.
- Trustworthy: data products must meet strict quality standards, just like a product in a supermarket must provide accurate and verifiable information about its traceability.
- Self-describing: every data product must have a clear description within the catalog, like every product in a supermarket would have a label clearly describing it.
Do not confuse with Data Product
It is important not to confuse data as a product with the data product. A data product is created from data and can take the form of a report or analysis, as you can read in our article on BI consultants.
In the context of the Data Mesh, data as a product refers to how data sets are governed and managed.
Self-Serve Data Infrastructure as a Platform
The Data Mesh promotes a self-serve data infrastructure. It’s as if supermarket customers could serve themselves without having to ask for a seller’s help.
This approach promotes greater agility and greater efficiency, as users can get the data they need more quickly.
Federated Computational Data Governance
Federated data governance involves establishing standards and rules for data management across the organization.
In our supermarket, although each section can manage its products as it sees fit, all must follow certain standards, such as using the same currency or respecting the same opening hours.
Similarly, in an organization that uses the Data Mesh, each domain can manage its data as it sees fit. However, it must follow certain standards and rules to ensure the quality, truthfulness, interoperability, consistency, and collaboration of data.
This federated structure allows for uniform governance while promoting flexibility and domain-specific expertise.
Advantages of the Data Mesh
Reduction of bottlenecks and better resilience
By decentralizing data management, the Data Mesh contributes to the reduction of bottlenecks in workflows and strengthens resilience.
In centralized governance, the risk is that the team managing the data is too slow, unreliable, or fails, penalizing the entire organization.
On the contrary, in a Data Mesh model, each domain independently manages its data, thus reducing blockages and improving the reliability of the whole.
Improvement of data quality and customer satisfaction
Adopting the Data Mesh often leads to better data quality. Indeed, each functional domain is best suited to understand and guarantee the integrity of its own data. Consequently, customer satisfaction increases.
Increased accessibility and new opportunities
The Data Mesh encourages wider access to data across the company. This approach allows for discovering new opportunities from data coming from different departments of the company.
Team autonomy and agility
The Data Mesh promotes autonomy at the team level, increasing the agility of the company. Indeed, each team can adjust its data management practices according to its specific needs, allowing for a more flexible and adaptable system.
Stimulation of innovation
By unleashing the potential of each team, the Data Mesh stimulates innovation. Having direct control over data gives teams the opportunity to test new ideas and approaches. The most innovative teams can inspire others, and internal innovation contests can create a healthy competitive environment.
Transparency, compliance, and interoperability
The Data Mesh promotes transparency by making all data available to all stakeholders. It also ensures compliance by providing a global framework for data governance. It also promotes interoperability by establishing common data standards. Finally, it is politically easier to audit several domains than one that everything depends on…
Disadvantages of the Data Mesh
The disadvantages of Data Mesh are essentially due to the lack of consideration and willingness on the part of the management team, who do not allocate enough time and resources to properly apply the theory. This negligence then limits the benefits outlined above:
Complexity of coordination and risk of silos
The decentralization of data can increase the difficulty of coordination between teams and risk creating data silos where some parts are inaccessible to other teams.
This complexity is particularly evident when the company encounters difficulties in applying the method globally. This is often a problem of resources and management.
Challenges of federated data governance
The implementation of federated data governance can represent a major challenge, especially for large companies with several teams and stakeholders.
Companies with multiple subsidiaries, or in the process of merger-acquisition, with diverging political interests or cultures, must consider these issues before they begin. These elements can generate additional pitfalls that require special attention.
Costs and challenges of cultural change
The implementation of Data Mesh often involves a major cultural change, which can be expensive and difficult to achieve. It requires changes in work habits and investments in training and support. The solicitation of a consulting firm, such as Data Éclosion, is highly recommended to successfully manage this change.
Inclusion of technical profiles in business teams
Data Mesh requires close collaboration between technical and business teams. This is not always easy, especially in large companies. Business teams may lack the technical skills to manage the data themselves. In addition, the different mindsets between technical and business teams can create obstacles. It is vital to work together, align objectives and understand each other’s needs to succeed with Data Mesh.
When Data Mesh is not for you
Your company is not big enough
In the case of a small or medium-sized business, Data Mesh can be more complicated than beneficial. The decentralization it implies can create data silos and coordination costs that make no sense on a small scale.
In other words, if you are not a large company, a centralized data team might better meet your needs without the complexity of Data Mesh.
On the other hand, if your company is set to grow, you can split data management into two parts: generic and specific to your business.
Thus, you can later replace the specific part with the different domains to achieve your Data Mesh in the long run.
Data culture is insufficient at the management level, not facilitating bottom-up decisions
To succeed with Data Mesh, you need a culture that values trust, autonomy and bottom-up decision making.
If these elements are not part of your company culture, implementing a Data Mesh will be a challenge.
You must have clear roles, well-defined responsibilities and a structure that encourages experimentation and responsible decision-making within teams.
Weak data governance, technical maturity, security
Data Mesh can be a premature approach if your organization does not have some data governance, sufficient technical maturity, or if security measures are not well established.
Before considering Data Mesh, preparatory projects must be carried out to set up these fundamental elements. Without these foundations, the implementation of Data Mesh could lead to more complications than benefits.
Data Mesh in the real world
Let’s now explore three examples of use cases in large companies:
Netflix’s Data Mesh
Netflix needs no introduction, a renowned streaming provider for its films, series and original content. But Netflix has become too big, and its complexity has increased. From start to finish of creation, from presentation to post-production, everything was more complex. Despite its innovative use of microservices (Netflix OSS), traditional data management was insufficient.
Faced with this challenge, Netflix launched a data mesh platform. This facilitates decentralized data management, with easily configurable pipelines via a drag-and-drop interface.
By capturing changes with CDC sources, using GraphQL and Apache Iceberg tables, data becomes consistent and interoperable. End-to-end auditing enhances reliability, creating a reliable and discoverable environment.
The company also developed a semantic layer that provides a self-service data platform. Here, data is treated as self-descriptive products. Accessibility and understanding are emphasized in order to grasp the scale of Netflix’s data.
Netflix has adopted a technical approach aligned with its development culture, with a keen awareness of areas for improvement. At the heart of these challenges is data governance.
By moving from the app-centric approach, embodied by domain-driven design, to the data-centric approach, represented by Data Mesh, the company has been able to adapt and modernize its data management.
BP’s Data Mesh
BP, an energy giant with hundreds of subsidiaries, faced a major challenge in data management. The old centralized model no longer met the data sharing needs of this huge company. Its sprawling nature was particularly well suited to the federated data governance of Data Mesh.
The company then chose to adopt the Data Mesh model, characterized by global governance but with decentralized teams managing their own data products.
Accenture played a key role in this transformation, as highlighted by Teresa Tung of Accenture, Abeth Go and Liam Donohoe of BP, in the podcast AI Leaders Podcast: Data Mesh from Theory to Practice.
BP’s first applications focused on data products related to tracking CO2 emissions and optimizing their reduction through digital twins. Liam Donohoe affirmed that these use cases aligned with BP’s sustainable development strategy.
Abeth Go mentioned that the implementation of the Data Mesh at BP required a significant effort to explain the model to all stakeholders, leading to strengthening data governance processes and the role of data stewards.
The benefits of this transformation are beginning to emerge. Liam Donohoe notes improvements in terms of interoperability between existing systems and new platforms thanks to the Data Mesh approach. Abeth Go also sees potential in sharing data across the entire company, despite its complex organization.
Vistaprint’s Data Mesh
Vistaprint sought to optimize the use of its data and analytical capabilities. Sebastian Klapdor, EVP and Chief Data Officer at Vistaprint, shared his experience at Corinium’s CDAO Fall conference in Boston, reported by Business of Data.
The problem lay in a centralized data architecture that prevented scaling.
Comparing data to water and not to oil, Klapdor decided to implement the Data Mesh.
To achieve this, he set up autonomous teams, such as Marketing, to move towards a logic of data as products.
Thanks to the cloud, he set up a self-service data infrastructure. These changes allowed teams to work faster.
In addition, Klapdor shared five tips for those looking to implement a Data Mesh architecture:
- Create a domain map.
- Develop a solid data platform.
- Create a data catalog.
- Promote the idea of a “data product“.
- Develop effective communication processes.
He also emphasized federated computational data governance. In the long term, he plans to further extend data products to improve Vistaprint’s performance.
Data Mesh Architecture
Most of the time, when we talk about data mesh architecture, we are talking about an organizational model or data governance rather than software architecture.
However, mastery of data architectures at the enterprise level is essential to concretely implement the Data Mesh.
Data Mesh vs. Data Lake
Originally, a Data Lake is a distributed file system like Apache Hadoop HDFS. It meets the needs of the 3Vs of Big Data, and its simplicity can make it attractive for storing huge amounts of raw data.
The Data Mesh organizes these data lakes and makes functional domains autonomous in their use.
Data Mesh vs. Data Fabric
Data Fabric, a term attributed to NetApp around 2016, is an enterprise-level unified architecture. Neighbor of the Data Hub or Data Platform concept, it integrates a range of services, from data ingestion to their final use.
Data Fabric goes beyond simple storage and offers benefits such as data visualization, access and control, protection and security. The Data Lake can be part of this architecture, as a central repository.
Again, the Data Mesh allows decentralized and autonomous use of this data architecture. Its implementation offers maximum flexibility and efficiency in managing data at scale.
Books on the Data Mesh
- Data Mesh: Delivering Data-Driven Value at Scale by Zhamak Dehghani, 2022, O’Reilly Media, ISBNs: 1492092398, 978-1492092391. In this practical book, the author of Data Mesh introduces this decentralized sociotechnical paradigm drawn from modern distributed architecture. It offers a new approach to managing large-scale analytical data. The book guides practitioners on their journey from traditional big data architecture to multidimensional and distributed analytical data management.
- Data Mesh: Designing a Decentralized Data Architecture by Zhamak Dehghani, 2023, O’Reilly, ISBNs: 3960092075, 978-3960092070. Written in German, this book revisits the concept of Data Mesh, while introducing new thoughts.
- Data Mesh: Reinventing Data Architecture for Decentralized and Autonomous Data Teams by Brian Murray, 2023, independent publisher, ISBNs: 979-8393243128. This book offers a comprehensive guide to understanding and implementing the principles of Data Mesh and Data Products in your organization. It provides practical examples, best practices, and common challenges for implementing these principles in your organization.
- Building an Event-Driven Data Mesh: Patterns for Designing & Building Event-Driven Architectures by Adam Bellemare, 2023, O’Reilly Media, ISBNs: 1098127609, 978-1098127602. This book shows how to design and build an event-driven (streaming) Data Mesh, providing practical guidance, solutions to potential pitfalls, and a clear understanding of event modeling. It also includes best practices for managing large-scale events.
- Data Mesh in Action by Jacek Majchrzak, Sven Balnojan & al., 2023, Manning Publications, ISBNs: 1633439976, 978-1633439979. This practical book teaches how to decentralize your data and organize it into an efficient mesh. It guides readers through the implementation of a Data Mesh in the organization, with workshop techniques and discussions on sociotechnical architecture and domain-driven design.
- Data Products and the Data Mesh: Driving Business Value through Data Modernization by Alberto Artasanchez, 2023, independent publisher, ISBNs: 979-8397010504. This comprehensive guide explores the emerging paradigm of the Data Mesh and its impact on organizations in the data-driven landscape. The book offers a complete roadmap for building a decentralized, innovative, and scalable data ecosystem, and is an essential resource for data professionals, architects, and leaders.
Data Mesh Learning Community on Slack
Data Mesh Learning was founded in February 2021. This global community brings together over 7000 data leaders interested in Data Mesh. It uses Slack for connection and sharing, and organizes biannual meetups to discuss Data Mesh.
Users can also discover implementation examples through the User Journey video series. The community has recently opened up to sponsorship to expand its resources and has also launched a newsletter.
The team, including Melissa Logan, Hugh Lashbrooke, and board members such as Zhamak Dehghani, works to facilitate learning and connection among members via Slack.
You now have the keys to understand Data Mesh, a concept that is now essential. It revolutionizes the way data is managed and valued. This transformation is not only reserved for tech giants, but it is essential for mid-sized companies and large groups looking to stay competitive in a constantly evolving world.
If you are considering such innovation within your organization, contact Data Éclosion, our consulting firm. Our experts, specialized in this type of transformation, are here to guide you, adapting the best methods to your specific needs. Mastering the new era of big data and artificial intelligence is within your reach: it’s your move!