Data / Software Development / Storage

Liberate Your Historical Data to Realize Real-Time Business

22 Jun 2022 3:00am, by

If digitalization is measured by corporate spending on old-school databases, then it is in rude health.

Manish Devgan
Manish is the chief product officer at Hazelcast, one of the pioneers of real-time data. He has spent more than 20 years in data management and analytics, building industry-leading products. Before Hazelcast, Manish led Software AG's IoT platform product portfolio. Manish is a published author and a featured speaker at industry conferences.

Enterprise customers in 2021 spent a record $80 billion on database management systems (DBMS), an increase of nearly half compared to 2017, according to Gartner. Databases have consistently ranked highly in IT spend, and the cloud is accelerating that: Managed cloud data services accounted for 49% of all DBMS revenue last year — $39.2 billion. Picking through Gartner’s DBMS spaghetti here, you’ll see the most considerable growth among the cloud hyperscalers.

Data may be pouring in with digitalization, but another report from a different analyst suggests it’s not net-new data that is driving database spending; it is the attempt to house existing data. According to IDC here, data created in the cloud is not growing as fast as data stored in the cloud. Growth in DBMS spend seems to be coming as organizations grapple with re-housing that older data.

They are holding on to this data — putting it in databases and sinking it into data lakes — because they believe in its “latent potentially un-mined value,” IDC says.

While other surveys back up the idea that many businesses see themselves as data-driven, it seems relatively few are treating data as a capital resource or realizing its potential. One reason is the poor track record of analytics systems that have been relied upon to extract value from that data. It’s a particular problem when it comes to older data, IDC reckons.

This is a hurdle, especially as organizations prepare to leap into real-time business.

Liberate Legacy Data

Real-time is the next shift in digital: It means engaging with customers in the moment, for example, offering tailored sales promotions in the customer’s shopping basket as they’re checking out or spotting fraudulent financial transactions as they are happening and immediately alerting the customer. Real-time personalization means having an accurate and actionable profile of the customer, a profile that’s built on a combination of their clicks with other streaming data but — importantly — their history. This must be built on analytics capable of jointly processing this historic and streaming data in real time.

Much of the attention in the conversation about digital migration is on streaming data, and it’s easy to see why. Streaming is the fastest-growing category of data, according to IDC’s report: a superabundance of data from connected enterprises including devices and applications. Streaming data offers the prospect of understanding what people and machines are doing, as they’re doing it.

Historical data, meanwhile, incorporates information such as sales and marketing stats and records of customer interactions from customer relationship management and enterprise resource planning systems. It ranges in age from days to decades, sits in database systems or data lakes, and tends to be processed in batches. And there’s a lot of it around, in myriad locations around the enterprise. According to Gartner, the majority of today’s BI reports, dashboards and even machine learning projects use it.

Uniting historical and streamed data is vital to achieving what Gartner has called “an enterprise nervous system that provides connected, contextual, continuous intelligence across multiple locations.” In other words, real-time business.

A Shared Event

The time has come to liberate this old data from legacy systems, according to O’Reilly here, who rightly believes liberation will allow organizations to build new, decoupled products and services for digital.

O’Reilly reckons there’s a “full spectrum” of data-liberation strategies. At one end, we have the reactive approach — using frameworks that pull data from sources, but this does not scale well. At the other end is a strategic approach that uses an event-driven architecture (EDA) that O’Reilly seems to endorse.

EDAs describe actions that take place on your network as events. These are captured as they’re created for applications to act on, rather than the traditional batch-processing approach. Digitalization has seen EDAs grow in popularity and used in real-time marketing, website monitoring, fraud detection and more. Forrester reckons 35% of enterprises will focus on EDAs this year.

The liberation play for old historical data is that EDAs should be the architectural underpinning of the analytics needed for real-time business. EDAs should, therefore, be built to unite historical and streamed data.

There are two ways to achieve this.

The first is to build your own event-streaming architecture and integrate custom logic with infrastructure components, a route being taken by 70% of users, according to a joint report from Swim and Virtual Intelligence Briefing.

However, this will not deliver the analytics needed for real time. Why? It entails bolting together streaming engines and databases to unite their processing, requiring developers to build integrations of thousands of code lines that are complex to implement and maintain. Such integrations introduce bottlenecks that erode the real-time performance of an EDA and, therefore, of the analytics.

A better approach is to work at a strategic, platform level to unify stream and data-management systems using a high-level, API-driven architecture built on a shared query engine. This has three positive outcomes: It reduces development complexity; it lays the foundation for a declarative model of operating your EDA, thereby creating a low-code EDA environment for IT and business teams to build, maintain and operate; and it means that connectivity, object mapping and event handling are capable of spanning databases and data sources.

Any real-time analytics will, of course, be judged by its performance. A unified streaming engine can ingest, transform, distribute and synchronize data, but processing is a problem because historic and streaming data live in different places. Legacy data languishes on slow-performing disc-based systems while streamed data is born — and must be processed — on the network’s edge or deep inside the cloud IT infrastructure.

Investing in more hardware only adds to the cost and complexity of analytics projects. A far better approach is to process data using the one resource already resident on your network: memory. Clustered pools of memory in local servers — an in-memory grid — can deliver the required sub-millisecond responses, with millions of complex transactions performed per second for both streaming and old data.

Real-time business is the next phase in businesses’ digital evolution, but realizing real-time demands a 360-degree understanding of the customer. Event streams bring us closer to that understanding, but context is king and to that end, it’s vital we liberate old business data, not simply re-house it in the cloud. Uniting these data sources through an integrated EDA will deliver the intelligence where it’s needed at the speed it’s needed.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Feature image via Pixabay