Data / Edge / IoT / Contributed

Now Is the Time to Define ‘Real-Time’

12 Nov 2020 9:00am, by

Dinesh Chandrasekhar
Dinesh Chandrasekhar is the Head of Product Marketing, Data-in-Motion, at Cloudera. He is a technology evangelist and a thought leader with over 23+ years of industry experience and has an impressive track record of taking new mobile, IoT and Big Data products to market with a clear GTM strategy.

There’s no doubt that cloud has become ubiquitous, and thank goodness for that in 2020. We wouldn’t have survived the challenges of this year without cloud. It’s supported everything, from the sudden changes in the way we work to the way we access healthcare and even shop for vital goods. The cloud has become a natural extension to large-scale streaming architectures or IoT architectures for data analytics or storage.

While the cloud can hold massive volumes of data and scale up as needed, the more important requirement for such architectures is the need for getting insights about the data captured in real-time from various types of streaming sources. This is what enables business leaders to make key decisions at the right time based on actionable intelligence obtained in real-time.

The definition of “real-time” always varies based on whom you ask in the organization and how far they are removed from the impact on the value chain. If you ask a senior executive on the business side, their “real-time” could be stats from last night — for e.g. sales figures across regions. But, if you ask a machine operator, their “real-time” could be about a machine failure that is about to happen or that is happening — so, it will be a matter of seconds.

Depending on which operator or IT leader you ask, you’re going to get a different definition of “real-time data” — and that’s a huge problem.

The definition will also vary based on the actual use case itself. It is not uncommon to hear from IT managers that they collect data from various sources into a database, run an overnight ETL process and then look at “real-time” data the next morning. While we understand the contradiction in that statement, for that particular use case and the acceptable SLAs of the personas in that organization, it might be okay. A use case such as “nightly inventory” might fit that category. However, in today’s digital world where split-second decisions make or break revenue numbers or new customer markets, “real-time” needs to be true to the word.

So, depending on which operator or IT leader you ask, you’re going to get a different definition of “real-time data” — and that’s a huge problem, especially as they’re working on applications for autonomous driving, fraud prevention, airplane navigation, healthcare monitoring, contact tracing, and more. There are far more use cases that require the ability to analyze data and make educated decisions on-the-fly.

Analytics is delivered to different stakeholders in different formats at different levels of the data lifecycle. In a traditional data lifecycle, data is captured from traditional sources, pumped into a data lake and operational stores, from where various stakeholders derive insights via dashboards and other visualizations. This type of analytics has in-built latencies that are not suitable for making real-time decisions.

When you bring in data-in-motion or streaming data into the picture, we suddenly require an extension to the data lifecycle where we can quickly ingest such streaming data, process it in real-time and produce predictive analytics from it even before the data is actually sent to a data lake. This is the fundamental difference that makes streaming analytics more suitable for real-time use cases.

The top hurdles that enterprises face in deploying and managing applications for these different views are:

  1. A common data platform: that allows you to manage both aspects of the data lifecycle seamlessly and allows you to even connect the different streams together in such a way that real-time insights from streaming sources can be fed into operational stores and ML models derived from historical data can be fed back into the edge for more robust intelligence at the edge.
  2. Skills and expertise: sometimes implementing streaming analytics can require a lot of technical expertise and special tools. This shouldn’t be the case but enterprises shy away from implementing something useful like this just due to costs.
  3. Three Vs: The classic Volume, Velocity and Variety challenges are extremely important to handle when it comes to streaming data and processing that in real-time for analytics.
  4. Security and Governance: Two very important challenges that often get overlooked. Data security is paramount today. More than that, understanding what that data means to different stakeholders is even more important. Protecting PII (Personally Identifiable Information) as the data traverses through the data lifecycle should be a key characteristic of a good streaming data platform.
  5. Operational simplicity: As enterprises start handling more and more data loads, their operational challenges become super complex too. Managing analytics becomes extremely challenging when the operational nightmares are unmanageable. Data platforms need to provide good operational monitoring and management tools to offset such issues and make it easier for enterprises to consume real-time analytics.

Tackling these top hurdles is no easy feat. I’ve worked with many companies who are trying to untangle their data webs to finally get to real-time insights. The good news is that it’s completely possible to architect data flow from the edge all the way to the data center (or cloud), and up through the application stack. Understanding your data-in-motion or streaming data can help you and your teams align on what “real-time” means for your operations.

A newsletter digest of the week’s most important stories & analyses.