Enterprises are adopting commercial stream processing offerings from both their cloud providers and more specialized vendors. In fact, at least 54% of companies that use stream processing in an application are using a vendor’s stream processing product or service in production according to The New Stack and Lightbend’s recent survey. Among the vendors asked about, on average, three (2.8) are being used in production or are actively evaluated/piloted by a company that has live stream processing use cases. As more applications utilize stream processing move into production, we expect a wide array of vendors to compete based on what open source technologies they support and how they bundle stream processing into a larger platform. With the caveat that this study should not be used to calculate market share, let’s dig into the data.
Open source technologies are central to the stream processing stack. In fact, 83% of organizations with stream processing applications are either utilizing or actively evaluating/piloting an Apache Software Foundation project for this type of functionality. Apache Kafka leads the pack with two-thirds of respondents with stream processing applications either using or evaluating Kafka, but that is for functionality that includes stream processing as well as publish/subscribe and store. Since Kafka is usually used as just one part of the stream processing stack, there is a lot of room for other technologies to be utilized based on use case requirements.
Given their heritage supporting a specific open source technology, it is not surprising that Lightbend (Akka), Cloudera/Hortonworks (Hadoop) and Databricks (Spark) show prominently on users’ roadmaps. Of those using or actively evaluating Akka Streams, 66% are also using or considering Lightbend; for Spark Streaming, Databricks gets 35% of the technology’s users/evaluators to also at least evaluate or pilot its product. For companies that integrate Hadoop as a source for data streaming, 38% are using or evaluating Cloudera. Note that several companies that provide managed or supported versions of Kafka were not included in the questionnaire. Given its prominence, we expect that Confluent will also see strong consideration by those with existing Kafka deployments.
With a varying degree of success, the big cloud providers are also gaining customers by offering a combination of managed open source and proprietary technologies. AWS is the leading vendor with 22% using one of its offerings, which include Kinesis and a recently added managed Kafka service. This lead is partly due to the company’s dominance in the cloud computing market; two-thirds of companies said at least one of their applications with stream processing has AWS as part of its stack. Of this group, 37% of are actually using the company’s stream processing offerings. Google also does well; although 25% of stream processing users have Google as part of their stack, 45% of this group is deploying an offering like Dataflow or Dataproc for a production application. Azure is less successful, with “only” 18% of these cloud customers opting for a Microsoft service like Event Hubs Stream Analytics. On a positive note, Azure production usage is particularly strong among those who have IoT use cases and those expecting the integration of multiple data streams to become front-and-center in the next year.
The following charts were also used to write this article. You are encouraged to read the full “Streaming Data And The Future Tech Stack” for more insight and analysis. As always, The New Stack will explain how open source technologies are being deployed whether it is self-hosted or a managed service.
Feature image via Pexels.