Building an Integrated Infrastructure for Real-Time Business
Digitalization is leading the charge in post-Covid IT spending with a focus on real-time customer engagement — and that’s a challenge.
The majority of enterprises have — for the first time — devised organization-wide digital transformation strategies; 55% of IT spending will be allocated to digitalization by 2024, according to IDC.
But it’s far from plain sailing. IDC also reckons many will “struggle to navigate” the digital-first world as enterprises try to figure out what digital actually means to them.
The winners will be a group IDC calls “digital-first aficionados,” defined as those who innovate in customer engagement with data management and analytics technologies at their core.
That means shifting to a type of customer engagement that goes beyond simply being online. It means engagement in real time, such as tailored offers when the customer is shopping, eliminating fraudulent charges during the transaction, replacing a part before the machine breaks or supporting the customer while they’re banking.
Achieving this level of engagement demands a 360-degree view of the customer and real-time personalization — and that’s the challenge because it requires an accurate and actionable customer profile. Such profiles are built using a combination of streaming data from events — which are clicks on a site, machine-to-machine communications, transactions and so forth generated in milliseconds — and static, historical data for a contextual understanding of the customer. It takes a system of streaming analytics to build this holistic picture, but while businesses may believe in the power of real-time action and decision-making, many are too overwhelmed by customer data to act in real time.
Fewer than a third of executives can obtain the insight they need from their data, according to one Forrester report. In a separate Forrester survey with CSG Systems, just half — 51% — said they could offer personalized or customized interactions and 46% can orchestrate real-time actions.
Analyst McKinsey reinforces the point and suggests why this should be the case: Just a fraction of data from connected devices is being ingested, processed, queried and analyzed. Translated: It’s not the data that’s causing problems but the way it’s being processed.
There are three reasons for this problem.
First is the increasingly decentralized nature of data generation. Data is being created by applications, devices, servers and websites, with customer transactions taking place at all corners of a digital enterprise. For those attempting to understand and act on this in real time using streaming analytics, it creates a strategic architectural challenge of where and how to process the data and run the analytics. Should they process data where it’s created or transfer that data to a centralized data store? The former might have limited processing, but the latter almost certainly means data must make round trips across the network, meaning delayed analysis and, equally important, action.
Next is the prevailing approach to mastering streaming data. More than a third of organizations are embracing streaming applications and environments for data pipelines, data integration and stream processing, according to Swim’s State of Streaming Data report. The problem is that 70% are building their own streaming environments and infrastructure, forcing them to address issues of data storage, platform optimization and system integration. Getting this wrong is creating inefficiencies and performance overheads that hobble data processing and analysis.
Finally, there’s the presence of legacy data storage and analysis architectures. Databases and data warehouses store static, historic customer data but have not been architected or optimized to capture, ingest, process or analyze fast-moving event data streams. They must be integrated with streaming engines with the attendant risk of performance inefficiencies. Databases come with overhead, too — the need to invest in additional hardware for performance.
Architecture by Design
What does it take to increase the pace at which data is ingested, processed, queried and analyzed? A real-time data architecture founded on seven critical capabilities:
- An event broker and messaging tier. This layer provides the means to ingest and move data from different sources to consumers by providing a way to queue messages, serve as message brokers and support different communication patterns like publish-subscribe.
- A real-time data integration layer providing capabilities like data pipelines and streaming ETL — data collection from sources (extract), conversion to desired format (transform) and finally storing into a data store (load).
- A fast data-management layer to store and quickly access data. This layer should be based on a storage media and format considered “right” for your SLA needs. Memory-first tiered storage models and SQL-based access are key enablers in this tier.
- Event and stream processing supporting timely action and engagement based on the latest event data. Advanced capabilities include analyzing by grouping incoming events information continuously moving time windows, ability to join data streams and data stored in the data-management tier, and scale to handle millions of events per second.
- Real-time analytics caters to analytical workloads that may contribute insights to downstream operational applications. These provide value by accelerating legacy batch jobs and speeding time to insight by using open formats like Parquet and better compute engines, among other things.
- Real-time machine learning (ML). We know ML is reshaping the way businesses are able to tailor content and personalized services by having models adapt to user preferences, which often change in real time. Historically, ML has been developed on batch-based data as data scientists built and tested models using historical data offline. Real-time engagement, however, means feeding the model live data for continuous improvement. The core capabilities for accomplishing real-time ML include online prediction and continual learning, which include updating ML models in real time and incorporating new incoming data for accurate prediction.
- Applications — software and services optimized for real-time architecture and streaming analytics. Examples include commerce carts that make recommendations based on a shopper’s clicks and past behavior, or fraud-detection systems that identify normal behavior on a person’s credit card and can notify them of potential rogue transactions through alerts.
Blueprint for a Better Way
These elements already exist in information architectures. What matters when building real-time applications is how they’re implemented. They must be integrated. That means an architecture capable of streaming, querying and analyzing these events, but also of querying and analyzing stored data.
Further, that architecture must reconcile the strategic challenge of where to place your computational chips. In a distributed computing model, memory-first is your ally, so it pays to use the resources at your disposal by clustering the pools of memory and other low-latency storage tiers in servers that are available locally.
This means data does not need to return to the data center, nor must you beef up servers on the edge. This real-time data architecture will deliver low-latency streaming analytics that leverages access to fast contextual data and engagement with the customer in real time.
Beyond just having an online presence, the challenge for organizations is engaging with customers in real time. Becoming a “digital-first aficionado” means working from a 360-degree view of those customers — something that can only be assembled with the help of streaming analytics founded on an integrated real-time data architecture.