Kubernetes / Contributed

How PlanetScaleDB Deploys Vitess to Run Production Databases on Kubernetes

14 Aug 2020 10:07am, by

Jiten Vaidya
Jiten Vaidya is co-founder and CEO of PlanetScale. Jiten has over 25 years of experience as a software engineer and manager. He was one of the first engineers to transfer from Google to YouTube. He carried a pager for YouTube, and built stability and automation around MySQL. He has also worked at the United States Digital Service and Dropbox. Jiten enjoys reading and hiking in his spare time.

We are all familiar with the term “container” as it is used in the context of OS virtualization. But the term was originally used to describe standard shaped containers that revolutionized shipping. There are many benefits to standardized containers: they can be efficiently loaded, unloaded, stacked across multiple modes of transport and can be manipulated using mechanized systems such as cranes and forklifts. Although containers existed before WWII, the logistical requirements of the war made their utility even more apparent and led to their widespread use.

The containers that afford OS virtualization and which form the building blocks of orchestration frameworks such as Kubernetes were already ubiquitous before COVID-19. However, COVID-19 is serving as a singular watershed moment for Kubernetes similar to WWII. The need for social isolation and the resulting acceleration of digital transformation is driving increased adoption of Kubernetes.

Kubernetes has emerged as the de facto operating system for compute resources, either on your own data center or in the public cloud. Kubernetes abstracts away cloud-specific requirements and allows you to focus on feature development and uniform deployment. Developing and deploying on Kubernetes also allows you to use a whole ecosystem of tools that have been developed for the Kubernetes platform. Thus, Kubernetes has become the chief enabler of the movement from data centers to the cloud.

Until now, companies have been shy about deploying databases in Kubernetes because running a stateful service such as a database in a container within an orchestration framework is challenging; you cannot take for granted the longevity of the pod in which your master databases are running. Vitess solves these problems.

Vitess, an open source project developed at YouTube and now a graduated Cloud Native Computing Foundation (CNCF) project, was developed to scale YouTube’s databases horizontally during YouTube’s years of hypergrowth. YouTube needed to migrate its MySQL databases to run under Borg, the internal container orchestration system that Google uses for managing its own data centers, which is also the blueprint for Kubernetes. Vitess solved these problems by developing fast reparents to replica, transparent service discovery and excellent observability.

At PlanetScale, when we were considering our options for building out our own hosted database-as-a-service on Vitess, we made an early decision to run it on Kubernetes because Vitess was natively built to work on Kubernetes. One technology choice that we made was to use the operator pattern invented by CoreOS and build operational scaffolding to deploy, manage, and monitor Vitess database clusters to develop our own operator for Vitess. This allows us to treat the hosted Kubernetes abstract away the differences between the hosted Kubernetes services such as EKS, GKE, and AKS and treat regions across multiple cloud providers homogeneously providing our customers a true multicloud experience.

As we were developing the operator, we realized that this ability to abstract away the differences between the Kubernetes clusters would also allow us to use our control plane with the customer’s Kubernetes clusters and PlanetScaleDB for Kubernetes was born.

PlanetScaleDB for Kubernetes allows customers to configure custom regions pointing to their own Kubernetes clusters and then deploy MySQL compatible Vitess databases using the PlanetScale control plane into that Kubernetes cluster. This ensures that the data never leaves a company’s network perimeter and that security policies are met. PlanetScaleDB for Kubernetes offers all the benefits of running databases in Kubernetes with none of the hassle of managing it.

To enable PlanetScaleDB for Kubernetes, we needed to solve two problems. The first is understanding and dealing with the differences between Kubernetes environments, such as storage classes, availability zones, and heterogeneous hardware configurations. For example, for high availability we detect availability zones and deploy masters and replicas from a given database across availability zones. The second and more difficult challenge is the ongoing management of the database ensuring high availability in the face of pod evictions, Kubernetes host management, and application of patches at both the database and Kubernetes layers.

We built the solutions to these challenges in the PlanetScaleDB operator, which we use to power PlanetScaleDB Cloud. The same operator runs in the customer’s Kubernetes clusters and powers PlanetScaleDB for Kubernetes.

As we transition towards a world where remote work becomes common and almost all IT workloads move to the cloud, bringing the efficiency of the standardization and common infrastructure provided by Kubernetes, it’s more important than ever to allow companies to take their transactional databases with them at scale.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.