The Pinnacle of Real-Time Data Analysis: Stream Processing
Contrary to popular belief, the pinnacle of real-time data analysis is not the capability to rapidly analyze data — at scale — once it’s comfortably located inside a database. A superior form of real-time data analysis is the capacity to ingest, enrich, transform, process, and act on data before storing that data.
A comprehensive streaming data platform facilitates each of these advantages. Top options couple a stream processing engine with a fast data store to support a host of functionality for real-time data analysis, including machine learning inferences, distributed computations, and messaging services.
According to Hazelcast Chief Product Officer Manish Devgan, these capabilities are integral to “a consolidated platform to process streams before they even get stored. A lot of data designers say ‘we can process streams’, but they’re actually storing and then processing, whereas we have the ability to process data in-flight, or after storing it.”
Depending on the use case, the wherewithal to process data in flight versus doing so after data are written to a database makes a tremendous difference. The former is essential for real-time trade, logistics, fraud detection, and a wealth of other applications involving data analysis with low latency.
Consequently, this functionality allows organizations to instantly respond to data events as they happen, while furthering the business value derived from real-time data transmissions.
Fast Data Store
Hazelcast’s fast data store is responsible for several advantages when deploying its platform, including enriching real-time data with older data and reference data. For example, when generating real-time financial offers at the moment customers withdraw funds from an ATM, it’s necessary to augment the low latency transaction data with amounts of previous transactions, credit reports, and other germane data. Such data would be contained within Hazelcast’s data store, which can also store incoming data streams.
“Then you’re taking action based on not only the fresh data that came in, but also the data you have stored,” Devgan remarked. In addition to the speed it supports, Hazelcast’s high-speed reference database also has global state store capabilities, which reduces operational complexity for creating data pipelines, transformation, and more.
Thus, Hazelcast users “can write 10 pipelines, but if you need a shared global state for your customer, you won’t have to wire 10 different instances of RocksDB together,” Devgan observed. Hazelcast’s data store also provides a messaging service, to which different entities can publish or subscribe to, so real-time data events are routed to the right place.
Machine Learning Inferences
According to Devgan, the fast data store also doubles as a feature store for machine learning models and the inferences they provide on real-time data. The store is able to update a model’s features in real time so it has the most recently available data from which to create the most accurate predictions. “A lot of large customers, high-value customers, use real-time inferencing,” Devgan mentioned. “They use real-time features when they are doing the predictions. You can imagine, whether the pizza is going to be available in 20 or 22 minutes. That is the level of sophistication these guys can push to their customer experience.”
The overall speed of Hazelcast’s platform, particularly that attributed to the fast data store, is advantageous for this particular use case. In addition to updating features for the model with low latency, the solution also implements the models — and their predictions — with the rapidity necessary for real-time predictive or prescriptive analytics. According to Devgan, “Models need access to real-time attributes in less than 10 milliseconds, because they make a better decision. It’s a real-time machine learning workload.”
Stream Processing Engine
Several aspects of Hazelcast’s platform are underpinned by its stream processing engine. It’s involved in the computation distribution for everything from transformations (like basic joins and stream-to-stream joins) as well as windowing capabilities. For instance, in a security camera use case, windowing might entail processing data so that “in the last five minutes tell me if this door opened twice and a light went off, because I need to take action if so,” Devgan commented. Hazelcast’s computational model is designed to minimize data movement once it’s been ingested into the platform from an event source. As such, it “intelligently distributes”, as Devgan termed it, the computational logic to where the data is.
“The last thing you want to do in a distributed system is move data, so we minimize that,” Devgan revealed. “Instead of the application saying, ‘give me the last five numbers so I can compute the average,’ we take the average function and push it to the data.” Aspects of the transformation processes, some of which can also involve Hazelcast’s enrichment capabilities, effectively function as a means of integrating data. “A lot of people say that that capability of you being able to transform is actually what they categorize as real-time data integration,” Devgan commented.
The transformation and real-time data integration capabilities of Hazelcast are typified by the solution’s support for data pipelines. Oftentimes, it’s necessary to filter, enrich, and transform data in real time prior to processing data with low latency. The tandem of the platform’s stream processing engine and fast data store is pivotal for fulfilling these objectives for swift data analysis. Because Hazelcast is what Devgan called an “open platform”, it’s possible for “you to program the pipeline,” Devgan said. “You write these things called pipelines, which are like functions. They are little snippets of code that tell the system what it needs to do.”
Implementing this logic in pipelines takes real-time data through the desired rigors of contextualizing that data with historical data from the data store, transforming the data as desired, and operationalizing machine learning models for predictions based on data-driven events. Oftentimes, the final step in the pipeline is taking action by generating real-time alerts. Devgan referenced an airline use case with real-time data inputs about arrivals and departures, with a join in the pipeline to determine how long it takes to get between gates. “You want to join all those things, which include static data plus fresh data, and take action,” Devgan explained. “Send a text to this person because they’re going to miss their flight.”
The Right Combination
Ultimately, combining each of the capabilities Hazelcast provides for real-time data analysis in a single platform is the capital benefit of its approach. Its tooling involves a global state fast data store, stream processing engine, feature store, machine learning inferences, intelligently distributed computations, and mechanisms for transformation and enrichment. The result is a credible confluence for “real-time messaging, real-time data integration, a fast data store, stream processing, and real-time machine learning,” Devgan concluded.