Kafka Drops ZooKeeper for ‘Real-Time’ KRaft
The veritable Kafka real-time data processing platform is replacing its distributed-system consensus engine for a snappier “real time” quorum-based controller. Enhanced Kafka distributors such as Confluent are migrating users over to the new technology, promising a seamless transition, thanks to the hard work of the open source Kafka team.
“I think Kafka has outgrown Zookeeper, in the sense that we need high scalability,” said Chase Thomas, Confluent lead product manager for KRaft, ZooKeeper’s replacement. “We needed simple deployment, we need to address certain corner cases, correctness issues. We want to basically simplify the design and make it more scalable.”
No More Batch Processing
In many cases, real-time data processing is a much larger undertaking than it was back in 2011, when Kafka was released. Once the purview only of web-scale companies such as LinkedIn, real-time data processing is becoming increasingly important for a wide range of enterprises, whose customers expect instant up-to-date responses.
Enterprises “have their data in many different services. And they really need a central nervous system that can unify that data and have it go where it needs to go,” McCabe explained. Currently, there are 150,000 organizations are using Kafka, according to Confluent.
A major part of the problem is how ZooKeeper operates.
ZooKeeper elects a server to be a leader of each Kafka cluster. Each one of the “ensemble” of ZooKeeper servers maintain a complete state of the system in memory, in a hierarchical namespace. The leader server dispatches changes to all the other servers via the ZooKeeper Atomic Broadcast (ZAB) protocol, offering sequential consistency and atomicity for all updates, as according to the Java expertise site Baeldung.
ZooKeeper operates completely independently of Kafka, which adds to the system administrator’s management headaches and can slow the responsiveness of the system as a whole.
Other distributed systems, such as Elasticsearch, keep the synchronization as an internal matter. But Kafka has no way to monitor the event log, so changes always lagged between the controller memory and the state in ZooKeeper itself.
“Core functionality of Kafka, such as replication, is deeply tied into the controller,” explained Colin McCabe, principal engineer at Confluent, who is leading the KRaft development for Confluent.
ZooKeeper, not Kafka, holds the metadata about the system itself, such as what partitions are set up, and so on. When Kafka was introduced, the number of partitions users managed was quite low but over time that number has increased (The upper limit seemed to be about 200,000 partitions or so). So every time it was necessary to elect a new controller, the amount of time it took to get that partition metadata out to the nodes also grew.
So this is why Kafka is moving users to the more real-time oriented Apache Kafka Raft (KRaft, pronounced “Kraft” rather than “K-Raft”), an implementation of the Raft (Replicated & Fault Tolerant) quorum protocol first developed at Stanford University in 2014.
Using KRaft, a Kafka deployment can keep hot standbys, where there is no need for a controller to load in all the partition data.
A Kafka Improvement Proposal (KIP 500) kicked off the work on a self-managed metadata quorum, which would act as an internal service. The resulting KRaft is a Kafka implementation of the Raft protocol, one based on an event-based architecture. The name KRaft is basically short for Kafka Raft.
As McCabe explained, “Kafka itself is based on this sort of stream metaphor, where you have a stream of changes coming in. And by monitoring that stream, you always know where you are. And if you fall behind, you know the offset that you’re at, and you can catch up. And so the stream metaphor is very powerful for our users.”
“Minimizing metadata divergence is very important to us,” Confluent’s Colin McCabe noted.
So the idea is to use the stream process to manage the metadata itself. State is stored as a series of events, where each event gets an offset number, allowing a server to synchronize by playing all the events subsequent to its current offset.
In other words, “We use Kafka to store streaming data, so why not use a log to store streaming changes to our metadata?” Thomas said.
The log establishes a clear ordering between events, and ensures a single timeline for users.
As a result, KRaft has lowered the latency of metadata reads by a factor of 14. So if there is a problem, Kafka can recover 14 times as quickly. And now, metadata on as many as 2 million partitions can be stored, and kept up-to-date.
Dipping KRaft into the Streaming Waters
The first full release of Kraft came with Kafka 3.3, released in October.
KIP-833 designated Kafka 3.5 to be the first of the bridge releases that would provide an easy way to move from ZooKeeper, without any downtime. Although it is available now as an early release, it is not ready yet for a full production release, as the team are still working out features necessary for some edge cases.
Basically, the upgrade process involves adding new controller nodes, as well as adding additional functionality to existing nodes. The ZooKeeper nodes will get their marching orders from a new KRaft controller.
“The benefit of this is that we can keep the cluster running in the old mode for a while until we’re ready to roll over every broker. Enrolling brokers takes time. And when the brokers are rolled in, they’ll be in Kraft mode,” McCabe said.
At this point, the system will be in dual write-mode, and so the admins can roll back to ZooKeeper fairly easily if need be.
Kafka will drop ZooKeeper altogether for its version 4 release. Users who are still relying on ZooKeeper by that point will need to upgrade to a Bridge Release first.
These ripples are spreading down to the commercial providers. Data streaming platform provider Confluent has given its users the option to move to KRaft.
Confluent releases can arrive within a few months of the master on-premises Kafka release, and even sooner for the cloud services. In fact, Confluent Cloud is already running on KRaft, the Confluent folks told TNS.