The wide-scale adoption of data-streaming platform Apache Kafka can be reflected in the skyrocketing need for a data-management platform that can underpin data operations and needs across a number of sources, often at a global scale. According to a recent Gartner report “Understanding Cloud Data Management Architectures: Hybrid Cloud, Multicloud and Intercloud,” for example, almost half of all organizations with data-management operations manage data on on-premises and cloud environments (typically multicloud). While Gartner says more than 80% of these organizations rely on multicloud environments.
In practical terms, multimedia giants such as Netflix need to provide tailor-made video streaming experiences to millions of viewers in different geographic locations around the world. A mom-and-pop online retailer requires real-time data streaming for its supply chain spread across continents and for online data transaction storage and management. Both use cases typically involve streaming data from on-premises and multicloud servers.
Kafka serves as a “central nervous system” to connect disparate data sources, often located in different corners of the globe, As Ben Stopford, Confluent lead technologist for the office of the Chief Technology Officer; and Addison Huddy, Confluent group product manager, described in a blog post. Confluent offers a platform that extends features for Kafka.
Without Kafka, “a big unintended consequence of all the hybrid multiclouds out there is in this effort to create these streams and break down the silos and pile the data together, you can actually end up often creating new data silos as you create different sets of data streams,” Dan Rosanova, head of product management for Confluent Cloud, told The New Stack. “They’re just not tied together and you kind of created a new problem to solve.”
As organizations seek to manage data through a single platform or “central nervous system,” Kafka is deployed through clusters running on servers across the different data environments. However, Kafka’s reach and features have their shortcomings that Confluent says its Confluent Platform 6.0 helps to solve.
As part of its Project Metamorphosis (the literary reference to the famous Franz Kafka novel “The Metamorphosis,” in which the main protagonist becomes a giant cockroach, was probably intended), Confluent Platform 6.0 was created to solve several issues operations teams face when managing clusters with Kafka. These issues include, for example, having to manually link clusters together. Remediating single cluster failures, as well as linking them together, has represented one of more time-consuming and resource-draining tasks for operations teams with Kafka. The new cluster link feature accomplishes this by both automating the pooling together of the different clusters into a “global mesh of Kafka.”
“Connecting different clusters from data centers and cloud environments together in a real-time manner that is cost-effective, done in real-time and is easy to operate is a really hard problem to solve. But Kafka now does a really good job of this by using the Confluent Platform 6.0 cluster link feature,” Huddy said. “You can connect all clusters directly together, almost creating a federated single cluster that can be described as a ‘Kafka mesh’ if you will, that goes around the globe. And with that, you can unlock a ton of use cases,” that might include video distribution or sharing trading information between brokerage houses in London and in New York.
Other features Confluent communicated that Platform 6.0 offers include:
- ksqlDB: replaces subsystems needed to build event streaming applications. The process is also error-prone when done manually. “One large customer and a heavy Kafka user said they had an entire team that spent their entire time moving these partitions around,” Huddy said. “With self-balancing, you don’t have to worry about it — you press the switch and the system self-balances itself.”
- Tiered Storage: allows organizations to retain data in the second layer of storage, which Huddy compared to Dropbox, but at a much lower cost. “Essentially the data comes in, and after a certain period of time, it automatically moves to that second tier of data, such as Amazon Web Services (AWS) S3, so that Kafka clients don’t have to go to another system to read the data. This is because the data never leaves Kafka — all your data is in one place.”
Amazon Web Services and Confluent are sponsors of The New Stack.