AI Will Drive Streaming Data Use — But Not Yet, Report Says
Seventy-two percent of people knowledgeable about streaming data believe artificial intelligence and machine learning (AI/ML) will drive its adoption over the next one to two years, according to a new report.
Yet, AI/ML is neither the top use case nor the benefit cited by current users of streaming data. Instead, real-time analytics is by far the most common use case, and improved productivity is the top motivation to use the technology.
The survey by Redpanda, a data streaming platform, was completed in July and August by 300 U.S. respondents, most of whom hold IT roles.
Overall, 59% of survey participants said they currently use streaming data; the remaining 41% are assumed to have plans to adopt the technology sometime in the future. Due to self-selection among respondents, overall adoption levels in the marketplace are likely lower.
Even among current users, only slightly less than half (46%) have actually finished implementing their streaming data solution. Many of the implementations are incomplete because streaming is not used across many of the applications in use.
Another report, by Confluent, found that 47% of streaming data organizations have at least 10 critical systems that rely on the technology. This group may not be done making their transition to streaming data, but their implementations are relatively mature.
Data Volume Will Drive Infrastructure Changes
As the number of systems that use data streaming increases, data volume will certainly rise as well. Per the Redpanda study, 82% of organizations with streaming data generate at least 10GB a day of volume from analytical workloads, with almost as many (76%) handling that volume of transactional streaming data workloads.
To understand how much data infrastructure will need to adapt in the near future, the first step is to determine how many more systems will rely on streaming data. Next, the degree to which each application’s use of data will increase is estimated.
Although AI/ML may require a lot of data to create a model, it often doesn’t need to be “fast data.” The velocity at which the data is accessed is not always a critical metric, because only a select group of AI/ML use cases actually require real-time, streaming data. The rest can use more traditional batch processing techniques.
Top Reasons Streaming Data Is Used
While AI/ML gets a lot of attention, it is not the most common use case. Real-time analytics is used by 71% of data streamers in the Redpanda survey, followed by 64% supporting e-commerce transactions with streaming data. Internet of Things (IoT), fraud detection and personalization are also commonly supported.
A still impressive 47% of survey participants have a situation where AI/ML uses streaming data. For context, way back in a 2019 study The New Stack conducted with Lightbend, researchers found that 33% of streaming data users used the technology for AI/ML.
Improved productivity was cited by 60% of Redpanda respondents as a reason to use streaming data technologies. At 49%, business agility was the second most cited motivation. Support for AI/ML was also cited by 43% as a motivation.
In terms of business goals, 64% of users believe streaming data is benefiting cybersecurity, and the same percentage are broadly looking for anomalies in their data streams.
Streaming Options Are Complex
When asked about the perceived technical challenges of working with streaming data, security and data privacy were cited by 42% of participants in the Redpanda survey. Other key findings:
- Data consistency (35%) and complexity (29%) were cited as other leading challenges in working with streaming data.
- People who haven’t started using streaming data are almost twice as likely as current users (40% versus 21%) to be concerned about complexity.
In terms of business challenges, cost/price is cited most often. Attaining necessary in-house skills is also problematic, as data engineers are in high demand.
Streaming Data Analytics: Tools
Streaming analytics tools are used by 66% of current users, according to the Redpanda survey, followed by 4% deploying streaming databases (such as Materialize), 51% using an operational database (such as Apache Cassandra) and 41% using a data lake or warehouse.
Only a third (34%) are actually using a stateful stream processing framework. Google’s Dataflow is the framework used by the largest number of survey respondents, followed by Apache Flink and NiFi. Although they are lower on the list, emerging vendors DeltaStream and Quix are used more often than several of their competitors.
Differentiating between these options is confusing because many data platforms bundle several of these capabilities together. For example, while 77% of current users in the Redpanda survey said they have a Kafka-compatible platform, only 7% of users are specifically using Kakfa Streams.
Data platforms like Aiven and Redpanda are compatible with a range of stream processing frameworks and databases. Instead of making users decide on a specific technology, data platforms are now competing based on how many data sources and types they can integrate with.