Data, often compared to the “oil of the 21st century”, defines our digital age. Theexplosion of Big Data in the 2010s, combined with the emergence of the cloud and artificial intelligence, has transformed our technological landscape. However, centralized structures have failed to fully exploit this potential. Born around 2018, the Data Mesh concept promises to revolutionize the way we manage and exploit data. Discover this essential paradigm through an exploration of its birth, principles, pros and cons, and real-life examples such as Netflix, BP and Vistaprint. Essential reading for anyone wishing to understand the future of data management.
Contents
What is Data Mesh?
Reasons for Data Mesh
Over the past few decades, companies have been faced with an exponential increase in the variety, volume and velocity of data – the 3Vs. In response, we have moved from the data warehouse to the data lake. This move was primarily a technical response to the challenges posed by the 3Vs, but it’s important to note that data governance remained centralized.
Let’s take the example of the USA, a federal country where the decentralization of powers seems to have fostered great economic prosperity. What’s more, the fact that the USA has one of the oldest constitutions in the world tends to demonstrate the stability of the federal system. By contrast, the USSR, where all decisions were taken in Moscow, did not enjoy the same boom. This contrast works in favor of decentralization.
On the other hand, imagine that the Internet had been built around a central computer. CompuServe, a pioneer of online services, followed this model, but was superseded by the Internet’s decentralized architecture, which was essential to its worldwide success.
Similarly, in the enterprise, this centralization of data is a limiting factor. It can hamper innovation, limit access to data, and slow down responses to the ever-changing needs of the company’s various functional areas.
To solve these problems, a new approach has emerged: the Data Mesh. This initiative decentralizes data management while strengthening collaboration between the various players. The word “mesh”, evoking the link in a decentralized but interconnected network, aptly symbolizes this new philosophy.
Data Mesh definition
Data Mesh is a decentralized method of data management. Each functional area of the company is responsible for its data, integrating multi-disciplinary teams combining business and technical skills. Data is seen as a product, accessible on a self-service basis by other areas.
Data governance is federal. Each domain has functional and technical freedom, framed by common standards. These standards promote collaboration and global consistency, while remaining minimalist to stimulateagility andinnovation.
History of Data Mesh
Although decentralized data management may have existed before Big Data, the concept of the Data Mesh is clearly attributed to computer scientist Zhamak Dehghani.
The birth of the Data Mesh at Thoughtworks in 2018
It was around 2018, when she was Director of New Technology Incubation at Thoughtworks, an American digital services company, that Zhamak Dehghani founded the Data Mesh concept.
With over 20 years’ experience as anexpert in software architectures and a degree from Shahid Beheshti University in Iran and the University of Sydney, she has also written books on the Data Mesh, published by O’Reilly.
A renowned speaker, she eventually left Thoughtworks to set up her own company, Nextdata.
Popularisation on Martin Fowler’s Blog in 2019
Zhamak Dehghani outlined the concept of the Data Mesh in 2019 in his article entitled. How to move from a monolithic data lake to a distributed Data Mesh on Martin Fowler’s website; Fowler is also a key figure at Thoughtworks.
Martin Fowler is a software development legend, author and speaker, specializing in object-oriented design, UML and agile development. He has written nine books on software development.
Inspired by Domain-Driven Design (DDD) by Eric Evans
The concept is inspired above all by Eric Evans’ domain-driven design.
Evans is the author of Domain-Driven Design: Tackling Complexity in Software, published in 2004. He has worked on large enterprise systems development projects since the 1990s and currently heads the Domain Language consulting group. DDD has strongly influenced the Data Mesh philosophy.
Other Influences
Data Mesh also draws ideas from Manuel Pais and Matthew Skelton‘s theory of team topologies. This theory aligns with the Data Mesh’s decentralized approach, contributing to the emphasis on accountability, interoperability, and team collaboration in data organization.
Data Mesh principles
Data Mesh is based on four principles, which we outline below using the example of a supermarket, where the products on the shelves are data:
Decentralized data ownership by domain
In the Data Mesh, each functional area is responsible(data ownership) and accountable(data stewardship) for its own data.
This can be illustrated by making the comparison with a supermarket where each department represents a different area:
Food department
Data | Description |
---|---|
Label | organic, fair trade, GMO-free |
Expiry date | every product has a sell-by date |
Provenance | place of harvest or manufacture |
DIY department
Data | Description |
---|---|
Type | tools, materials, hardware |
Brand | every tool has a brand |
Technical specifications | size, weight, material |
In this way, each department is best placed to manage its own products: each area is best placed to process its own data.
Data as aProduct
In the Data Mesh, data is seen as a product. In our supermarket, each piece of data would therefore be a product on a shelf, ready to be discovered, accessed and used by consumers. To facilitate this, data products must have at least the following properties:
- Discoverable: each data product must be included in a centralized data catalog, as in large hypermarkets, where kiosks equipped with touch screens enable customers to easily locate the shelf of a product they are looking for.
- Addressable: each data product must have a unique address, like a barcode, and a unique shelf for each product in a supermarket.
- Trustworthy: data products must meet strict quality standards, just as a product in a supermarket must provide accurate and verifiable traceability information.
- Self-describing: every data product should have a clear description in the catalog, just as every product in a supermarket would have a label clearly describing it.
Not to be confused with DataProduct
It’s important not to confuse data as a product with a data product. A data product is created from data and can take the form of a report or analysis, as you can read in our article on BI consultants.
In the context of the Data Mesh, data as a product refers to the way in whichdata sets are governed and managed.
Self-service data platform
The Data Mesh promotes a self-service data infrastructure. It’s as if supermarket customers could serve themselves without having to ask a sales assistant for help.
This approach promotes greater agility and efficiency, as users can get the data they need faster.
Federated data governance
Federated data governance involves establishing standards and rules for data management across the organization.
In our supermarket, although each department can manage its products as it sees fit, they all have to follow certain standards, such as using the same currency or respecting the same opening hours.
Similarly, in a Data Mesh organization, each domain can manage its data as it sees fit. However, it must follow certain standards and rules to guarantee data quality, veracity,interoperability, consistency and collaboration.
This federated structure enables uniform governance while promoting flexibility and domain-specific expertise.
Benefits of Data Mesh
Reducing bottlenecks and strengthening resilience
By decentralizing data management, the Data Mesh helps reduce workflow bottlenecks and boost resilience.
With centralized governance, the risk is that the team managing the data will be too slow, unreliable or fail, thus penalizing the entire organization.
On the contrary, in a Data Mesh model, each domain manages its data independently, thus reducing blockages and improving overall reliability.
Improving data quality and customer satisfaction
The adoption of Data Mesh often leads to improved data quality. Each functional area is best placed to understand and guarantee the integrity of its own data. As a result, customer satisfaction increases.
Greater accessibility and new opportunities
Data Mesh encourages wider access to data across the enterprise. This approach enables new opportunities to be discovered from data coming from different departments within the company.
Team autonomy and agility
Data Mesh fostersautonomy at team level, increasing businessagility. Indeed, each team can adjust its data management practices according to its specific needs, enabling a more flexible and adaptable system.
Stimulating innovation
By unleashing the potential of each team, the Data Mesh stimulatesinnovation. Having direct control over data gives teams the opportunity to test new ideas and approaches. The most innovative teams can inspire others, and internal innovation competitions can create a healthy competitive environment.
Transparency, compliance and interoperability
The Data Mesh promotes transparency by making all data available to all stakeholders. It also ensures compliance by providing a comprehensive data governance framework. It also promotes interoperability by establishing common data standards. Last but not least, it is politically easier to audit several domains than a single one on which everything depends…
Disadvantages of Data Mesh
The disadvantages of Data Mesh are essentially due to a lack of consideration and willingness on the part of the management team, which does not grant sufficient time and resources to apply the theory correctly. This neglect then limits the advantages outlined above:
Complex coordination and risk of silos
Data decentralization can increase the difficulty of coordination between teams, and risks creating data silos where certain parts are inaccessible to other teams.
This complexity becomes particularly apparent when the company encounters difficulties in applying the method on a global scale. This is often a problem of resources and management.
Challenges of federated data governance
Implementing federated data governance can be a major challenge, especially for large companies with multiple teams and stakeholders.
Companies with multiple subsidiaries, or in the process of M&A, with divergent political interests and cultures, need to consider these complexities before embarking. These elements can generate additional obstacles that require special attention.
Costs and challenges of cultural change
Implementing Data Mesh often involves a major cultural change, which can be costly and difficult to achieve. It requires changes in work habits, as well as investments in training and support. The services of a consulting firm, such as Data Éclosion, are ideally suited to the task.
Inclusion of technical profiles in business teams
Data Mesh requires close collaboration between technical and business teams. This is not always easy, especially in large companies. Business teams may lack the technical skills to manage the data themselves. What’s more, different mindsets between technical and business teams can create obstacles. Working together,aligning objectives and understanding each other’s needs is vital to success with Data Mesh.
When data mesh isn’t for you
Your company isn’t big enough
In the case of a small or medium-sized enterprise (SME), data meshing can be more complicated than beneficial. The decentralization involved can create data silos and coordination costs that make no sense on a small scale.
In other words, if you’re not a large enterprise, a centralized data team might better serve your needs without the complexity of the Data Mesh.
On the other hand, if your business is going to grow, you can separate data management into two parts: generic and specific to your activity.
In this way, you can later replace the specific part with the various domains, so as to eventually obtain your Data Mesh.
There is a lack of data culture at management level, which does not facilitate bottom-up decisions.
To succeed with Data Mesh, you need a culture that values trust,autonomy and bottom-up decision-making.
If these elements are not part of your corporate culture, setting up a Data Mesh will be a challenge.
You need clear roles, well-defined responsibilities and a structure that encourages experimentation and responsible decision-making within teams.
Weak data governance, technical maturity, security
Data Mesh may be a premature approach if your organization lacks a certain level of data governance, sufficient technical maturity, or if security measures are not well established.
Before considering Data Mesh, preparatory work must be carried out to put these fundamental elements in place. Without these solid foundations, implementing Data Mesh could lead to more complications than benefits.
Data Mesh in the real world
Let’s take a look at three examples of use cases in large companies:
Data Mesh from Netflix
Netflix, the streaming provider renowned for its films, series and original content, needs no introduction. But as Netflix got too big, so did its complexity. From the beginning to the end of creation, from presentation to post-production, everything was more complex. Despite its innovative use of microservices(Netflix OSS), traditional data management was insufficient.
In response to this challenge, Netflix has launched a data mesh platform. This facilitates decentralized data management, with pipelines easily configured via a drag-and-drop interface.
By capturing changes with CDC sources, using GraphQL and Apache Iceberg tables, data becomes consistent and interoperable. End-to-end auditing reinforces reliability, creating a reliable and discoverable environment.
The company has also developed a semantic layer that provides a self-service data platform. Here, data is treated as a self-describing product. Accessibility and comprehension are emphasized in order to grasp the breadth of Netflix’s data.
Netflix has adopted a technical approach aligned with its development culture, with a keen awareness of areas for improvement. At the heart of these challenges lies data governance.
By moving from an app-centric approach, embodied by domain-driven design, to a data-centric approach, represented by the Data Mesh, the company has been able to adapt and modernize its data management.
BP Data Mesh
BP, an energy giant with hundreds of subsidiaries, was facing a major data management challenge. The old centralized model no longer met the data-sharing needs of this huge enterprise. Its sprawling nature lent itself particularly well to the federated data governance of the Data Mesh.
The company then chose to adopt the Data Mesh model, characterized by global governance but with decentralized teams managing their own data products.
Accenture has played a key role in this transformation, as highlighted by Teresa Tung of Accenture, Abeth Go and Liam Donohoe of BP, in the AI Leaders Podcast : Data Mesh from Theory to Practice.
BP’s first applications involved data products related to tracking CO2 emissions and optimizing their reduction through digital twins. Liam Donohoe said these use cases aligned with BP’s sustainability strategy.
Abeth Go mentioned that the implementation of the Data Mesh at BP required a major effort to explain the model to all stakeholders, leading to the strengthening of data governance processes and the role of data stewards.
The benefits of this transformation are beginning to emerge. Liam Donohoe notes improvements in interoperability between existing systems and new platforms thanks to the Data Mesh approach. Abeth Go also sees potential in sharing data across the enterprise, despite its complex organization.
Data Mesh from Vistaprint
Vistaprint has been looking to optimize the use of its data and analytical capabilities. Sebastian Klapdor, EVP and Chief Data Officer at Vistaprint, shared his experience at Corinium’s CDAO Fall conference in Boston, reported by Business of Data.
The problem lay in a centralized data architecture that prevented scaling.
Comparing data to water, not oil, Klapdor decided to implement the Data Mesh.
To achieve this, he has set up autonomous teams, such as Marketing, to move towards a logic of data as products.
Thanks to the cloud, he has set up a self-service data infrastructure. These changes have enabled teams to work faster.
In addition, Klapdor shared five tips for those looking to implement a Data Mesh architecture:
- Create a map of domains.
- Develop a solid data platform.
- Create a data catalog.
- Promoting “data product” thinking
- Develop effective communication processes.
He also emphasized federated IT governance of data. Ultimately, he plans to further extend data products to improve Vistaprint’s performance.
Data Mesh Architecture
Most of the time, when we talk aboutdata mesh architecture, we’re talking about organizational model or data governance rather than software architecture.
However, mastery of enterprise-level data architectures is essential for the practical implementation of Data Mesh.
Data Mesh vs. Data Lake
Originally, a Data Lake is a distributed file system like Apache Hadoop HDFS. It meets the needs of the 3Vs of Big Data, and its simplicity can make it attractive for storing huge quantities of raw data.
The Data Mesh organizes these data lakes and makes functional areas autonomous in their use.
Data Mesh vs. Data Fabric
Data Fabric, a term coined by NetApp around 2016, is a unified architecture at enterprise level. Akin to the Data Hub or Data Platform concept, it integrates a whole range of services, from data ingestion to end use.
The Data Fabric goes beyond simple storage, offering benefits such as data visualization, access and control, protection and security. The Data Lake can be part of this architecture, acting as a central repository.
Once again, the Data Mesh enables decentralized, autonomous use of this data architecture. Its implementation offers maximum flexibility and efficiency in large-scale data management.
Going further
Books on Data Mesh
- Data Mesh: Delivering Data-Driven Value at Scale by Zhamak Dehghani, 2022, O’Reilly Media, ISBNs: 1492092398, 978-1492092391. In this practical book, the author of Data Mesh introduces this decentralized socio-technical paradigm derived from modern distributed architecture. It offers a new approach to large-scale analytical data management. The book guides practitioners on their journey from traditional big data architecture to multidimensional, distributed analytical data management.
- Data Mesh: Eine dezentrale Datenarchitektur entwerfen by Zhamak Dehghani, 2023, O’Reilly, ISBNs: 3960092075, 978-3960092070. Written in German, this book revisits the concept of Data Mesh, while introducing new ideas.
- Data Mesh: Reinventing Data Architecture for Decentralized and Autonomous Data Teams by Brian Murray, 2023, independent publisher, ISBNs: 979-8393243128. This book offers a comprehensive guide to understanding and implementing the principles of Data Mesh and Data Products in your organization. It provides practical examples, best practices and common challenges for implementing these principles in your organization.
- Building an Event-Driven Data Mesh: Patterns for Designing & Building Event-Driven Architectures by Adam Bellemare, 2023, O’Reilly Media, ISBNs: 1098127609, 978-1098127602. This book shows how to design and build an event-driven ( streaming ) Data Mesh, providing practical advice, solutions to possible pitfalls and a clear understanding of event modeling. It also includes best practices for managing large-scale events.
- Data Mesh in Action by Jacek Majchrzak, Sven Balnojan & al, 2023, Manning Publications, ISBNs: 1633439976, 978-1633439979. This practical book teaches how to decentralize your data and organize it into an efficient mesh. It guides readers through the implementation of a Data Mesh in the organization, with workshop techniques and discussions on socio-technical architecture and domain-driven design.
- Data Products and the Data Mesh: Driving Business Value through Data Modernization by Alberto Artasanchez, 2023, independent publisher, ISBNs: 979-8397010504. This comprehensive guide explores the emerging Data Mesh paradigm and its impact on organizations in the data-driven landscape. The book offers a comprehensive roadmap for building a decentralized, innovative and scalable data ecosystem, and is an essential resource for data professionals, architects and executives.
Data Mesh Learning community on Slack
Data Mesh Learning was founded in February 2021. This global community brings together over 7,000 data leaders interested in Data Mesh. It uses Slack for connecting and sharing, and holds bi-annual meetups to discuss Data Mesh.
Users can also discover implementation examples through the User Journey video series. The community has recently opened up to sponsorship to expand its resources, and has also launched a newsletter.
The team, including Melissa Logan, Hugh Lashbrooke and board members such as Zhamak Dehghani, is working to facilitate learning and connection between members via Slack.
Conclusion
You now have the keys to understanding the Data Mesh, a concept that has become indispensable. It’s revolutionizing the way data is managed and exploited. This transformation is not just the preserve of technology giants, but is essential for mid-sized companies and large corporations seeking to remain competitive in a constantly changing world.
If you’re considering such an innovation within your organization, contact Data Éclosion, our consulting firm. Our experts, specialized in this type of transformation, are here to guide you, adapting the best methods to your specific needs. Mastering the new era of big data and artificial intelligence is within your grasp: it’s up to you!