KUDO Automates Kubernetes Operators

Kubernetes Operators simplify the experience of automating complex applications in containers — for example, deploying Kubernetes-native stateful Cassandra clusters that can scale alongside your stateless containers — but creating those operators can be anything but simple.
At Snyk’s All The Talks virtual conference last month, Matt Jarvis, director of community at D2IQ, offered the example of an unnamed but “major noSQL vendor” who created a “very feature-rich” operator that comes in at over 40,000 lines of code and took more than a year in labor hours to build. Even the operator for etcd, which has “a fairly simple set of lifecycle states” takes more than 9,000 lines of code, he said.
“Building operators can be very complex and requires a lot of knowledge about the internals of Kubernetes, requires Go expertise, as well as requiring that domain-specific knowledge about your application,” explained Jarvis. “The engineering effort which goes into high-quality operators can be very considerable.”
The Kubernetes Universal Declarative Operator (KUDO) uses declarative YAML to more easily build these Kubernetes operators, going beyond mere deployment to include automation for tasks such as configuration updates, failure recovery, and binary upgrades. KUDO does this by providing a universal operator to help automate the process of building Kubernetes operators.
Jarvis offers a more succinct definition, saying that KUDO “basically, kind of defines this framework for operational sequencing and actioning, and allows you to define run books, and ship that with your application,” while the website of the project further distills the purpose as having the ability to “deploy your applications, have the tools needed to operate them, and understand how they’re behaving — all without a Ph.D. in Kubernetes.”
Comparing KUDO to other operator building frameworks, such as the Operator Framework currently offered by Red Hat and originally created by CoreOS, or the Kubernetes SIG API Machinery sub-project Kubebuilder, Jarvis explains that the primary differences lie in KUDO’s ability to manage the entire software lifecycle, while existing implementations often do not, and the creation of operators using YAML instead of potentially thousands of lines of Go. He also points out that KUDO is polymorphic, meaning that it “will become an operator for any application for which you write a KUDO operator — so it has a single controller, and it can create pretty complete operators without needing either deep knowledge of Kubernetes or necessarily writing any code.”
Diving further into how KUDO works, everything is broken down into three nested parts.
First, an “operator” in KUDO terminology is the description of a deployable service represented as a custom resource definition (CRD) object in your cluster. Inside of that lies the “operator version,” which is a specific version and implementation of that service, with its specific plans, objects, and parameters. Finally, the “instance” is the actual deployment or instantiation of the application being controlled by KUDO, and there can be many instances within a single operator version, which itself may also have multiple instances with the larger, overarching operator.
From here, the plans contain the instructions, consisting of phases and steps, which define tasks such as upgrading, deployment, backup, and restoration, and provide the runbook for KUDO to execute, and all of which can be run either serial or in parallel. In more concrete terms, you can imagine a setup where you are running Kafka, wherein the overarching operator is for Kafka, the operator versions are for different versions of Kafka, and finally an instance is the deployed version of one of those versions of Kafka.
For the curious, Jarvis offers a full demo of KUDO in action, which can be viewed in the embedded video below (about 12 minutes in).
Looking ahead at the roadmap for KUDO, Jarvis lays out several features currently in the pipeline, including the “piping” of information from one task to another, dynamic CRDs that can be “created on the fly, by an Ops person who wants to modify how a particular plan is executed”, and the use of something other than YAML, such as StarLark or CUE. Also listed is the ability to support dependencies and to extend existing things, such as Helm charts, into KUDO operators.