Stateful Workloads on Kubernetes Are a Thing, but There Is a Twist

Kubernetes is becoming an industry standard; Cockroach Labs and Red Hat had found that 94% of organizations surveyed deploy services and applications on Kubernetes. And standardization means running any stack on any environment for any workload, including stateful ones.
While running stateful workloads on Kubernetes used to be a no-go, the Data on Kubernetes 2021 Report found that 90% of surveyed companies believe Kubernetes is ready for stateful workloads. Even more surprising, a large majority (70%) of them are running them in production. So while it is definitely possible to run data on Kubernetes, there is a twist: operators.
Kubernetes operators, introduced in 2016 are programmable extensions performing operations that Kubernetes cannot handle natively. Operators provide intelligent, dynamic management capabilities by extending the functionality of the Kubernetes API. When some, such as the founder of Docker, see them as programmable microplatforms, where they have the most impact is on running stateful workloads.
Day 2 Operations
Why is that? While Kubernetes is now providing rock-solid foundations for running data-oriented workloads, some application-specific tasks — more specifically day 2 operations — can’t be handled natively. Taking databases as an example, day 2 operations would include performing a backup, taking a snapshot, performing a failover, applying a patch, or indexing a column. Because every database will do things slightly differently, it would be hard for Kubernetes to natively handle this application-specific how-to.
That is why while running a stateful application on Kubernetes is definitely possible, how well will be largely influenced by how good are operators are. The Data on Kubernetes Community found that 42% of operator users complained about their varying degrees of quality. Understanding the ins and outs of an application can be a challenging task, let alone encapsulating them to be functioning in a highly distributed and dynamic environment that is Kubernetes.
And while the Operator Framework provides a solid set of developer tools and Kubernetes components, there is room for improvement for complex use cases such as multicluster, as highlighted in this video interview by DataStax Product Manager Christopher Bradford at KubeCon EU.
It can be a daunting task for end users to pick suitable operators among over 250. The varying degrees of quality can be explained by the fact that they are built by a wide range of organizations such as service companies, vendors, OSS communities, and individuals. And while there are best practices to follow, the isn’t strong guidance on how to build them and no obviously no oversight. Finally, because there can be different valid ways of performing the same operations, operators may take different technical approaches, which have to be weighed by their users.
The Operator Framework provides a “Capability Level” chart which can help end-users understand the level of maturity of an operator. It is broken down into five levels:
- Level 1 – Basic install: automated application provisioning and configuration management
- Level 2 – Seamless upgrades: patch and minor version upgrades supported.
- Level 3 – Full lifecycle: application lifecycle, storage lifecycle (backup, failure recovery).
- Level 4 – Deep insights: metrics, alerts, log processing, and workload analysis.
- Level 5 – Autopilot: horizontal/vertical scaling, auto-configuration, tuning, abnormal detection, scheduling tuning
Another promising way to approach the operator quality issue is to get back to what is successfully used by Kubernetes: a community-led management approach, more specifically, the Cloud Native Computing Foundation (CNCF). PostgreSQL experts EDB recently decided to open source their PostgreSQL operator — CloudNativePG — and to submit it to the CNCF with the goal of having it as a graduated project.
The three stages of CNCF projects are sandbox, incubation, and graduation. To move up the ladder, each project needs to prove that it’s credible, sustainable, widely adopted, has a healthy rate of change, and is developed by contributors from multiple organizations. The process takes years, but it would ensure that operators passing this process are production-ready. I predict more will follow this path.
We at the DoK Community recently announced a landscape that indexes and documents existing products, solutions, and consultancies that enable data on Kubernetes. It currently lists: database, storage, data manager, and operators. Yet another way for end-users to find their way.
We’ve officially crossed the chasm” is what Kelsey Hightower said while announcing an operator for the Oracle Database referring to running stateful workloads on Kubernetes. Now that the practice isn’t only for early adopters anymore, and we starting to surf the mass adoption wave, it’s time for the industry to ramp up with more standards and robustness.