Cloud Native / Data / Kubernetes / Sponsored / Contributed

I’ve Got 99 Workloads, But Data Ain’t One

30 Sep 2021 7:42am, by

Michelle Gienow
A frontend web developer and recovering journalist, Michelle slings technical content for Cockroach Labs by day, recreational JS by night. She is co-author of 'Cloud Native Transformation: Practical Patterns for Innovation' from O'Reilly Media.

These days so many applications are being deployed in containers that it’s probably easier to ask who isn’t using Kubernetes than who is. A frustrating problem for both sides of DevOps, though, is not being able to use Kubernetes for the whole stack; databases have been lagging behind.

Developers want to unify their data infrastructure and application stacks. Operators want to apply the same tools to both databases and applications. Everybody wants to capture the same benefits for the data layer that Kubernetes grants the application layer: rapid spin-up and repeatability across environments in a self-healing, horizontally scalable distributed system.

It has been hard to actually achieve this, though. Why?

OG Kubernetes offered the ability for users to define their requirements component-style from the simple building blocks of pods on up to deployments. Then they could use control loops to propel the orchestrated components toward a desired end state (restarting a pod, creating a DNS entry), ensuring that the running state remains consistent with the target state. Whenever failures are detected, control loops also help to automatically fix them.

The myriad possible combinations of the Kubernetes components of pods, nodes, clusters and deployments are what makes K8s horizontally scalable, and control loops grant K8s its self-healing superpowers. However, distributed databases require more complex sequences of actions than can be readily assembled from these built-in resources.

This is why for a long time the path of sanity for many was to just run the database alongside Kubernetes, rather than on it.

But now, thanks to the maturing of the K8s ecosystem, we can have nice things, like all of our workloads in one place.

Database, Meet K8s

Can you really run a database on Kubernetes now? With complex operations and the requirements of persistent, consistent data and guaranteeing ACID transactions across a distributed system? Yes.

We are at last reaching an event horizon where enough people have figured out how to solve for persistence and consistency in distributed databases, and the Kubernetes ecosystem has sufficiently evolved and expanded to allow reasonably low-pain integration of the data and application layers.

What got us to the point where running a database on Kubernetes is not just possible, but also increasingly even optimal?

  1. Advanced stateful components. Kubernetes v1.9 introduced StatefulSets, APIs that manage the deployment and scaling of a set of pods while also guaranteeing the order and uniqueness of these pods. They make it possible to deploy stateful applications in a K8s cluster. StatefulSets decoupled the database application from the persistent storage. Data can be stored on persistent volumes so that when a pod gets recreated, all the data is still there. Moreover, there is a consistent endpoint to connect to, because pods recreated in a StatefulSet simply keep the same name.
  2. Availability. An important milestone was overcoming availability concerns. Persistent volumes mean that now we have capability to make sure data is replicated, and K8s is smart enough to deal with shuffling data around as peers come and go.
  3. Operators. The Kubernetes API has matured to include custom resources, and OSS frameworks like kubebuilder and operator-sdk emerged to simplify creating custom resource definitions and their controllers. Operators let users extend beyond the K8s built-in basics to domain-specific logic, by defining new resource types and controllers, so they are able to create the complex actions that distributed databases require.
  4. CSI. Combined with the persistent volume API, the container storage interface (CSI) allows for compute and storage to be loosely coupled to the point where it’s sometimes possible to define data storage that follows the application as it is rescheduled around the cluster.

So, the necessary ingredients are in place. Are organizations actually using them to cook up runbooks for databases on Kubernetes?

Workloads Tell the Tale

Clearly there has been massive growth in all workloads on Kubernetes across the spectrum, both from the broader application ecosystem and an ever-increasing number of independent vendors. But what’s harder to see is exactly how companies are day to day deploying their workloads in production. Ultimately, though, Kubernetes is a platform for getting applications built and deployed, meaning we can look at workloads to get the real story of how organizations are using Kubernetes these days.

A new study, the 2021 Kubernetes Adoption Trends Report, takes a first-hand look at how organizations across every sector are working with Kubernetes in day-to-day production deployments. What does it say about orgs running databases on Kubernetes?

Well, participants said their No. 1 challenge is effectively architecting for and deploying data-intensive transactional workloads. This was chosen by the highest number of participants (46%), over other challenges like migrating legacy workloads and hiring/retaining skilled engineers. In fact, a significant number of respondents named deploying distributed transactional workloads as their only concern.

The numbers show that running a database on Kubernetes in production is a priority for most of these organizations right now.

Extrapolating one step further, we see that of the survey participants who named architecting for data workloads as their primary challenge, not one of them indicated migrating legacy workloads was also a secondary concern.

The data seems to indicate that these respondents — 25% of the entire survey population! — have identified their transactional database as a bottleneck and are seeking to modernize. It would appear that they’re already successfully deploying distributed applications with Kubernetes and now are investing in ways to continue evolving and improving their stack.

There’s a lot more in the report, including:

  • Types of workloads these organizations are running.
  • The distributions they use as a production environment.
  • What types of teams they dedicate to building and managing all of it.

The report also offers analysis and insights into how companies are strategizing for the future around multicloud and hybrid, serverless and, yes, bringing their databases into the current century by at last uniting their data and application layers so they can live happily ever after, together in K8s.

To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon+CloudNativeCon North America 2021 on Oct. 11-15.

Photo by Luis Quintero from Pexels.

A newsletter digest of the week’s most important stories & analyses.