Aiven Adds Kafkawize for Data Governance of Kafka
Aiven recently announced its addition of Kafkawize to its open source cloud data platform. Kafkawize, which Aiven has renamed Klaw, is an open source tool for governing data in Apache Kafka, a framework for managing low latent streaming data.
Kafka is widely used for real-time data generated in e-commerce, Internet of Things, and Industrial Internet applications. Klaw enables Kafka users to govern that data at a fine level for individual topics, which are one of the fundamental organizing units of managing data in Kafka.
“When companies have a big cluster in Kafka, the first thing they encounter is: how can I define policies around who can access what, and what comes into which topic, and who can read and write from those?” said Josep Prat, Aiven Open Source Engineering Director. “This quickly becomes a nightmare.”
Klaw is designed to reduce the complexity of these and other data governance tasks. It functions as a governance layer through which organizations can access their Kafka data. The solution is a central management plane for managing access control lists, implementing data governance policies, and defining schema according to topics.
“To manage topics in Kafka is cumbersome,” Prat noted. “It’s complicated. How can we make this thing accessible for everyone? How can we define certain processes so if you follow those, most people will have a nice experience with Kafka?”
Centralized Governance Capabilities
Implementing Klaw helps with these issues and more. Granted, Klaw users can access Kafka without going through Klaw. However, by accessing Kafka data through Klaw (either via REST APIs or Klaw’s user interface), they can streamline much of the complexity for governing Kafka. Klaw automates calls to Kafka from one place. It’s also integral for managing different connectors, determining which ones can talk to one other, and understanding which topics they have governance clearance for accessing.
Without Klaw, organizations utilizing Kafka would commonly encounter situations in which, before implementing connectors, “I need to request three, four, five things in five different ways; each thing works differently,” Prat said. “Forget about that. The tool is a central place that will help you make those decisions quicker and more secure.” Klaw contains what Prat termed an “admin-task dashboard” for standardizing the management of topics and connectors. It’s a locus for assigning and understanding ownership of topics and implementing governance based on enterprise rules. “Let’s say we have a topic that has sensitive data,” Prat said. “The owners of that topic are the ones who can define who can write or read to that topic.”
Kafka is a distributed platform for high throughput systems generating numerous types of messages, many of which directly pertain to events in low latent data transmissions. Prat described an e-commerce use case in which different events might be triggered by customers opening a certain web page, placing items in their shopping carts, or initiating the checkout process. The sensitive Payment Card Information following this final event is readily apparent.
All these events are categorized as different topics, which Prat defined as “a semantic collection of messages that will be written and then consumed.” Topics can contain data from multiple producers or sources (such as sensors emitting temperature data) and are consumed by any number of clients. Consequently, it’s critical to understand who owns those topics, who can read and write data about them, and what their schema is to facilitate data governance.
The notion of schema — which is frequently considered for aspects of data modeling and data integration — is essential for properly governing Kafka’s data. Klaw helps define schema and account for what Prat called “schema evolutions” related to the changing of data sources or business requirements. “Schema is how the message looks like, so then everyone consuming that knows exactly what they’re getting out of that topic,” Prat explained. “I’m getting that message; it looks like that. It has this shape and I know what I’m getting.”
Topics readily incorporate modifications to business requirements to help schema evolve to reflect new or ongoing elements to messages. “All that stuff, who can read, who can write, the schema definitions, that’s the management of the data governance of that topic,” Prat remarked.
Real-Time Data Governance
Klaw is a viable means of governing data from Kafka. It’s useful for ensuring even these real-time, voluminous data amounts adhere to the rigors of data privacy and regulatory compliance characteristic of many data governance programs. By encompassing these capabilities in an open source framework, Aiven seeks to democratize them to users of varying size, financial backing, and sophistication.