Distributed Data, Not Apps, Build the Foundation for Web3
Every business appreciates the value of data as it relates to competitive advantage. What is less appreciated is the value of relating through data to drive competitive advantage. Companies that understand this distinction, like Amazon, Netflix, Airbnb, are quickly displacing the leaders across many industries.
It is tempting to point to the more visible artifacts that have led to their success, such as adopting new tools, API and microservice infrastructures, and the DevOps practices that enable rapid iterative change. But underneath this is a different relationship to data than others. Data is not something orthogonal to application development. It is actually at the center of it.
In most companies, data infrastructure is built to manage the sharing and collaboration of data across different applications, services, analytics, and AI models. Companies talk about harnessing data exhaust through expensive new staging platforms such as data warehouses, data lakes, and even data lake houses. The fundamental problem with this approach is that data is considered almost as an afterthought that must be integrated, secured, cleaned, and wrangled to create value.
Some of the signs of this misalignment are the significant costs incurred on integration, security, and data engineering. IDC estimates enterprises spend about a third of their AI lifecycle time on data integration. Companies are burdening developers with more security-related development in response to the rise in API-related security incidents. One survey found that 91% of companies experienced an API security incident in 2020. Finally, companies are discovering they often need five data engineers for each data scientist to get the data into the form and location required for good data science.
Making the lead to data as collaboration requires a significant shift in mindset. However, organizations can ease this transition by addressing three key pillars of data as collaboration:
Factor in the effects of time through immutability;
Enforce identify and trust directly on the data; and
Adopt a shared vocabulary around data.
Factor in the effect of time through immutability
Time is an ephemeral quality that gets woven into data either on purpose or by accident. It is easy for developers to ignore this fact in the rush to get a new service running. But when time is not considered, applications can break in funny ways, data can get lost, and enterprises need to invest in complex data integration efforts.
Developers often gloss over how data might be used outside of their application context. For example, standard practice is to perform create, read, update, and delete operations on data. But important information about time is lost when data is simply updated or deleted. Ensuring data reuse requires addressing changes to data to ensure traceability across all applications.
For example, if a bank discovers an error on your account, they do not simply update the balance to address the error. Instead, they submit a new transaction indicating an adjustment to correct the mistake. This corrects the balance and provides clarity for other applications that collaborate on this data.
Enforcing data immutability ensures that other applications have a consistent view of the individual account balance and bank portfolio reflected in audit statements, regulatory reports, and business analytics. Different variations of this problem can create challenging problems in microservice orchestration, intermittent failure, inconsistent report page layouts, and customer service frustrations communicating about price changes. Data has many tentacles. If we do not understand the traceability of data, we will be at a disadvantage.
Enforce Identify and Trust Directly on the Data
Most security architectures evolved from an era when one web application server sat in front of one database. Web Application Firewalls added an additional layer of protection after hackers discovered how deliberately malformed requests could break through the application server.
But this approach increases development overhead, application complexity, and the attack surface for modern applications woven across microservices, APIs, and software-as-a-service (SaaS) applications. A more direct approach lies in finding a way to connect identity and trust directly to the data rather than the applications that process the data.
This kind of architecture could follow the lead of browser developers that created the infrastructure for putting a green lock icon on the address bar to indicate when data is coming directly from a site like your bank. They had to create a web of trust for cryptographically certifying who controls a domain name and verifying that the information was sent directly from that domain without tampering.
Machines need a similar kind of green lock for the data in order to trust each other. Early efforts are often tied into the blockchain space because that is the logical place to manage identity. For example, the U.S. Department of Education sits on a new set of standards from the World Wide Web Consortium called decentralized identifiers. Down the road, new frameworks for self-sovereign identity could give individuals complete control of their digital identities in a way that connects data silos across trust boundaries.
Adopt a Shared Vocabulary Around Data
In everyday conversations, humans do a reasonably good job at filling in the context when someone uses a word in an unfamiliar context. But applications are sensitive to even the slightest variations in meaning between them. The challenge is that many parties create data types across the organization, resulting in data silos across applications, services, reports, analytics, and AI models.
In the book Software Wasteland, David McComb estimates that different ways of describing similar data can increase the cost of software development by more than 10 times over. Developers spend time coding logic to get data into and out of the application securely instead of focusing on writing the business logic that creates real business value.
Every organization must be able to adopt the appropriate vocabulary to survive. Standards are emerging for these kinds of vocabularies that can dramatically simplify these data conversations. For example, many in the healthcare industry are adopting Fast Healthcare Interoperability Resources (FHIR) while many in banking are adopting the Financial Industry Business Ontology (FIBO). These efforts can potentially provide a helpful starting point to this process. McComb asserted that some discipline is required to standardize not just the starting point but also the process of extending these data types to new use cases.
Staying Nimble Requires a New Viewpoint
The next generation of industry leaders will find new ways to blow away the competition by becoming more sophisticated in managing and collaborating around data. Many older companies have created massive anchors which are holding them back because their data is spread across hundreds of systems. Developers are incentivized for creating new applications that meet requirements, as opposed to facilitating data reuse.
Data centricity will be key on the road ahead: every organization needs to become data-centric. Today, many organizations are application-centric. They build capabilities by adding major systems for customer relationship management (CRM), finance, and human resources (HR) and then adding various best-of-breed capabilities from smaller vendors. Although it is easy to add these capabilities in the cloud, it also creates integration debt, making it challenging to create new business value.
Data is like bags of agricultural seed. It does nothing for an organization when it is just sitting in bags. What is more important is how you leverage it and what you can make out of it with the right fertilizer, water, and sunlight.
Leaders are waking up to the reality that the application is not the center of the universe. Instead, data as collaboration starts by approaching the data as the center with the applications built around it. This accelerates the process of generating value from that information. Companies that struggle with adopting a mindset for data as collaboration will be outmaneuvered by nimble competition that does.