API Management / Data / Development / Contributed

Distributed Data, Not Apps, Build the Foundation for Web3

7 Sep 2021 3:00am, by

Brian Platz
Brian is the Co-founder and Co-CEO of Fluree, PBC, a North Carolina-based Public Benefit Corporation focused on transforming data security, ownership, and access with a scalable blockchain database. Platz was an entrepreneur and executive throughout the early internet days and SaaS boom, having founded the popular A-list apart web development community, along with a host of successful SaaS companies. Previous to establishing Fluree, Brian co-founded SilkRoad Technology which grew to over 2,000 customers and 500 employees in 12 global offices.

Every business appreciates the value of data as it relates to competitive advantage. What is less appreciated is the value of relating through data to drive competitive advantage. Companies that understand this distinction, like Amazon, Netflix, Airbnb, are quickly displacing the leaders across many industries.

It is tempting to point to the more visible artifacts that have led to their success, such as adopting new tools, API and microservice infrastructures, and the DevOps practices that enable rapid iterative change. But underneath this is a different relationship to data than others. Data is not something orthogonal to application development. It is actually at the center of it.

In most companies, data infrastructure is built to manage the sharing and collaboration of data across different applications, services, analytics, and AI models. Companies talk about harnessing data exhaust through expensive new staging platforms such as data warehouses, data lakes, and even data lake houses. The fundamental problem with this approach is that data is considered almost as an afterthought that must be integrated, secured, cleaned, and wrangled to create value.

Some of the signs of this misalignment are the significant costs incurred on integration, security, and data engineering. IDC estimates enterprises spend about a third of their AI lifecycle time on data integration. Companies are burdening developers with more security-related development in response to the rise in API-related security incidents. One survey found that 91% of companies experienced an API security incident in 2020. Finally, companies are discovering they often need five data engineers for each data scientist to get the data into the form and location required for good data science.

Making the lead to data as collaboration requires a significant shift in mindset. However, organizations can ease this transition by addressing three key pillars of data as collaboration:

  • Factor in the effects of time through immutability;

  • Enforce identify and trust directly on the data; and

  • Adopt a shared vocabulary around data.

Factor in the effect of time through immutability

Time is an ephemeral quality that gets woven into data either on purpose or by accident. It is easy for developers to ignore this fact in the rush to get a new service running. But when time is not considered, applications can break in funny ways, data can get lost, and enterprises need to invest in complex data integration efforts.

Developers often gloss over how data might be used outside of their application context. For example, standard practice is to perform create, read, update, and delete operations on data. But important information about time is lost when data is simply updated or deleted. Ensuring data reuse requires addressing changes to data to ensure traceability across all applications.

For example, if a bank discovers an error on your account, they do not simply update the balance to address the error. Instead, they submit a new transaction indicating an adjustment to correct the mistake. This corrects the balance and provides clarity for other applications that collaborate on this data.

Enforcing data immutability ensures that other applications have a consistent view of the individual account balance and bank portfolio reflected in audit statements, regulatory reports, and business analytics. Different variations of this problem can create challenging problems in microservice orchestration, intermittent failure, inconsistent report page layouts, and customer service frustrations communicating about price changes. Data has many tentacles. If we do not understand the traceability of data, we will be at a disadvantage.

Enforce Identify and Trust Directly on the Data

Most security architectures evolved from an era when one web application server sat in front of one database. Web Application Firewalls added an additional layer of protection after hackers discovered how deliberately malformed requests could break through the application server.

But this approach increases development overhead, application complexity, and the attack surface for modern applications woven across microservices, APIs, and software-as-a-service (SaaS) applications. A more direct approach lies in finding a way to connect identity and trust directly to the data rather than the applications that process the data.

This kind of architecture could follow the lead of browser developers that created the infrastructure for putting a green lock icon on the address bar to indicate when data is coming directly from a site like your bank. They had to create a web of trust for cryptographically certifying who controls a domain name and verifying that the information was sent directly from that domain without tampering.

Machines need a similar kind of green lock for the data in order to trust each other. Early efforts are often tied into the blockchain space because that is the logical place to manage identity. For example, the U.S. Department of Education sits on a new set of standards from the World Wide Web Consortium called decentralized identifiers. Down the road, new frameworks for self-sovereign identity could give individuals complete control of their digital identities in a way that connects data silos across trust boundaries.

Adopt a Shared Vocabulary Around Data

In everyday conversations, humans do a reasonably good job at filling in the context when someone uses a word in an unfamiliar context. But applications are sensitive to even the slightest variations in meaning between them. The challenge is that many parties create data types across the organization, resulting in data silos across applications, services, reports, analytics, and AI models.

In the book Software Wasteland, David McComb estimates that different ways of describing similar data can increase the cost of software development by more than 10  times over. Developers spend time coding logic to get data into and out of the application securely instead of focusing on writing the business logic that creates real business value.

All the knowledge of the data structure, including its naming, validation, security, integrity, and even the meaning of the data, is locked up in the application of the code. As a result, developers inadvertently create new data complexity in the process of adding objects and classes, Javascript Object Notation (JSON) elements and keys, or database tables and columns.

Every organization must be able to adopt the appropriate vocabulary to survive. Standards are emerging for these kinds of vocabularies that can dramatically simplify these data conversations. For example, many in the healthcare industry are adopting Fast Healthcare Interoperability Resources (FHIR) while many in banking are adopting the Financial Industry Business Ontology (FIBO). These efforts can potentially provide a helpful starting point to this process. McComb asserted that some discipline is required to standardize not just the starting point but also the process of extending these data types to new use cases.

Staying Nimble Requires a New Viewpoint

The next generation of industry leaders will find new ways to blow away the competition by becoming more sophisticated in managing and collaborating around data. Many older companies have created massive anchors which are holding them back because their data is spread across hundreds of systems. Developers are incentivized for creating new applications that meet requirements, as opposed to facilitating data reuse.

Data centricity will be key on the road ahead: every organization needs to become data-centric. Today, many organizations are application-centric. They build capabilities by adding major systems for customer relationship management (CRM), finance, and human resources (HR) and then adding various best-of-breed capabilities from smaller vendors. Although it is easy to add these capabilities in the cloud, it also creates integration debt, making it challenging to create new business value.

Data is like bags of agricultural seed. It does nothing for an organization when it is just sitting in bags. What is more important is how you leverage it and what you can make out of it with the right fertilizer, water, and sunlight.

Leaders are waking up to the reality that the application is not the center of the universe. Instead, data as collaboration starts by approaching the data as the center with the applications built around it. This accelerates the process of generating value from that information. Companies that struggle with adopting a mindset for data as collaboration will be outmaneuvered by nimble competition that does.

The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Real.

A newsletter digest of the week’s most important stories & analyses.