Strategies for Running Stateful Applications in Kubernetes: Volumes
One of the key challenges in running containerized workloads is dealing with persistence. Unlike virtual machines that offer durable and persistent storage, containers come with ephemeral storage. Right from its inception, Docker encouraged the design of stateless services. Persistence and statefulness are an afterthought in the world of containers. But this design works in favor of workload scalability and portability. It is one of the reasons why containers are fueling cloud-native architectures, microservices, and web-scale deployments.
Having realized the benefits of containers, there is an ongoing effort to containerize stateful applications that can be seamlessly run with stateless application. Docker volumes and plugins are a major step towards turning stateful applications into first-class citizens of Docker. The recent ebook from The New Stack covered various aspects of container storage along with use cases.
Mesosphere DC/OS emphasizes running transactional workloads alongside cloud-native applications. Robin Systems, one of the container management companies, is aiming to containerize Oracle and other enterprise databases. The Kubernetes container orchestration engine is gearing up to run stateful workloads through a new concept called Pet Sets, which is a pod of stateful containers. Pet Sets was introduced as an alpha feature in Kubernetes 1.3, released in July.
Kubernetes abstracts the underlying infrastructure building blocks into compute, storage and networking. When developers and operations teams get started with Kubernetes, they typically get exposed to objects such as pods, labels, services, deployments and replica sets, which provide a mechanism to deal with compute and networking. When it comes to persistence in Kubernetes, users should get familiar with the concepts of volumes, persistent volumes, persistent volume claims (PVC) and the upcoming Pet Sets.
This article will be a first in a series that discusses the strategies and use cases for each of the storage choices available in Kubernetes. In this chapter, we will take a closer look at volumes, that provide the easiest migration path to Kubernetes.
Docker volumes bypass the union file system to directly access the storage available on the host. Depending on how a volume is created, its lifetime may be limited to the container, which means when the container is terminated, the volume gets deleted. With volume plugins, Docker started to offer storage volumes that go beyond the life of a container.
Kubernetes has a slightly different storage requirements than Docker. Since it supports packaging multiple containers into a pod, which is a logical unit of deployment, all the containers belonging to a specific pod should share the data. Containers in a pod may be occasionally restarted, which shouldn’t impact the storage mechanism. Unlike Docker volumes that are influenced by a specific container, Kubernetes volumes are tied to the lifecycle of a pod. Even if the containers running within a pod get terminated or restarted, the associated volume will continue to exist. It will only get deleted when the pod is explicitly terminated.
In scenarios where a volume should be made available even after terminating a pod, it may be based on a durable block storage backend, such as Amazon EBS, Google Compute Engine’s Persistent Disks (GCE PD), or a distributed file system such as network file system (NFS) or Gluster. The key takeaway is that multiple containers packaged as a pod share the same volume.
Kubernetes volumes may be classified into host-based storage and non-host-based storage types. Host-based storage is similar to Docker volumes, where a portion of the host’s storage becomes available to the pod. Once a pod is terminated, the volume gets automatically deleted. Non-host-based storage doesn’t rely on a specific node. Instead, a storage volume is created from an external storage service. Volumes based on this storage type would be available even after the pods are deleted.
Two specific volume types that are dependent on host-based storage are emptyDir and and hostPath.
An emptyDir volume is initiated during the creation of a pod on a specific node. It will last as long as the pod runs on the same node. As is obvious from its name, it starts as an empty directory on the node. All the containers belonging to the pod can read and write to the emptyDir volume. Each container may see the same volume through different paths. When the pod is terminated or relocated to another node, the data in the emptyDir gets permanently deleted. Even if the pod is reinstated on the same node, it will start with a blank volume. However, an emptyDir volume survives individual container crashes and restarts, making it a safe bet for storing shared configuration data.
Some of the popular use cases for using emptyDir volumes include:
- The creation of a scratch disk for storing intermediary data.
- A common storage area for sharing configuration settings and metadata across multiple containers of the same pod.
- A well-known storage location for containers to store and forward data. A crawler container might populate the volume periodically, while the web server is responding to the requests.
While the creation of emptyDir volumes is completely managed by Kubernetes, the hostPath volumes go beyond its scope. The emptyDir volumes are analogous to the implicit, per-container storage strategy of Docker. They are sandboxes managed by the container runtime. On the other hand, hostPath volumes mount a file or directory from the host node’s filesystem directly into the pod. While this is not the most recommended strategy, some applications that need instant access to existing datasets will find it useful.
The hostPath volumes come with caveats, such as the directories created on the underlying hosts are only writable by root. This forces the container to run in the privileged mode, or explicitly modifying the host file permissions to be able to write to a hostPath volume.
A variety of block storages, distributed filesystems and hosted filesystems — including Amazon EBS, GCE PD, Ceph, Gluster, NFS, Azure File System, Flocker and vSphere volumes — are supported. Refer to Kubernetes’ documentation for the supported drivers and their implementation.
Volumes are ideal for use cases where containers are migrated from a Docker to a Kubernetes environment. Given its compatibility with Docker volumes, applications designed for host-based persistence can easily take advantage of the feature.
In the next part of this series, we will understand how to leverage Kubernetes persistent volumes and claims to create a robust infrastructure for microservices.