Kubernetes Sets the Stage for Container-Native Storage

With the release of Kubernetes 1.6 last month, came a number of production-ready features that could streamline storage for containers. And already a number of efforts from Red Hat, CoreOS and Quantum Storage aim to leverage these features to make it easier to manage scale-scale container-native storage systems from within Kubernetes itself.
Containers are, by design, stateless entities. They are meant to be spun up and shut down, leaving no trace. But the applications they contain need persistent (“stateful”) storage to store data, such as configuration data and databases.
One approach would be to establish a single namespace for all the storage, using the Network File System (NFS), but that lacks flexibility, in terms of managing storage resources. Another approach gaining popularity would be to use a distributed files system, such as Gluster or Ceph. But these can be difficult to manage on their own.
Containers, and Kubernetes, makes it very easy to for developers to deploy a distributed containerized application; and distributed file systems are, at heart, is distributed applications. A controller somewhere needs to do resource management, allocations of resources, rebalancing. So why not have Kubernetes do this work of managing a distributed file system, with all its inherent difficulties? “Those are the same kinds of problems people face when they run any application in a cloud-native environment,” argued Bassam Tabbara, chief technology officer of Quantum.
“It’s a natural fit to run a storage cluster on Kubernetes. Wouldn’t it make sense to bring it into the fold and keep the unified management interface, and keep everything like a bunch of pods?” Bassam said. “I’d argue the storage cluster becomes more resilient with an orchestrator like Kubernetes.”
In fact, Kubernetes could be so handy at provisioning distributed storage, just like it is at provisioning other distributed apps, that this new crop of Kubernetes provisioning tools could go make a go at streamlining (or disrupting) not just container storage, but the overly complex world of software defined storage.
Just-in-Time Storage
The key to this new capability is a number of new features first introduced in Kubernetes 1.4 that are now production ready with Kubernetes 1.6.
One is dynamic storage provisioning module, which could be “very useful for automating the lifecycle and management of storage,” said Aparna Sinha, who leads the product management team at Google for Kubernetes, speaking at the Kubecon Europe 2017 conference.
Kubernetes provisioning relies on another new production-ready feature, storage classes, that provides a format for describing storage attributes, so they can be shared across different functions to aid in automation.
Kubernetes uses the concept of a “persistent volume” to encapsulate the back-end storage. Kubernetes can then provision storage volumes in two ways, either dynamically or statically, Sinha explained. In static provisioning, the creation of volumes must be done beforehand, so they can be tied later to persistent volume claims.
A speedier and more efficient way to allocate storage would be through the newly available dynamic provisioning. In this approach, Kubernetes will get a request from a newly-launched application to create a storage volume for itself, which Kubernetes can provision using plug-ins to different storage types.
With dynamic storage provisioning, the developer doesn’t even have to send requests to the cluster admin for storage allocation ahead of time. The whole system can be automated.
To help the app developer understand the storage options available, the cluster administrator uses the storage classes to describe the different storage options on-hand. In a persistent volume claim (PVC), the application just specifies the exact storage setup needed, using the appropriate storage class.
Kubernetes then provisions a new volume and then bind it back to the PVC. The app can then use the PVC to mount the newly provisioned volume. Call it just-in-time provisioning.
“Kubernetes will mount the volume to the pod,” Sinha said. “The pod can die and move to a different node and Kubernetes will automatically take care of remounting the storage to the new node. The data will still be there because the PVC remains. The PVC is constant and holds that claim to the storage.” Once the claim is deleted, then the volume is also deleted.
Heketi
But Kubernetes still requires tighter integration with distributed file systems to make this whole process seamless.
One effort under way is Heketi, a RESTful-based service which can be used to dynamically provision and manage the full lifecycle of the Gluster distributed file system, using Kubernetes, OpenStack Manila, or Red Hat’s OpenShift.
In effect, Heketi works as a high-level service interface that sits atop Gluster to simplify volume creation. Through a RESTful interface, Heketi can take requests from Kubernetes. The Kubernetes Gluster provisioner will “talk to Heketi which will talk to Gluster to create the volumes,” explained Michael Adam, a Red Hat architect who presented a talk about Heketi at the Linux Foundation’s Vault storage conference, held in Boston earlier this month.
Heketi can manage multiple Gluster deployments and can keep track of how much room is left on each cluster. You can feed the API the requests for the amount of storage needed, and perhaps the type or speed of storage (i.e. SSD). The developer shouldn’t have which disks are used, or which cluster is used, so all that is abstracted away.
Once the new volume is created, Heketi returns the volume information back to the provisioner, which relays that information to the caller.
When a storage administrator puts in a request to create a Gluster volume, Heketi will allocate the needed amount of storage in a Gluster volume spread out over a cluster of servers, even allocating replicas across multiple failure domains. The software formats, mounts and starts the newly created Gluster volume (try the demo here).
Gluster handles replication and distribution, so these duties don’t need to concern the storage admin nor the developer. Gluster volumes are made up of bricks (which, in their simplest form, are local directories in the nodes). Gluster can be accessed a variety of calls, including standard POSIX file system calls, NFS, SMB, iSCSI, and, for object storage, the Amazon Web Services’ S3-compatible Swift, from the OpenStack community.
So while it is possible to use an external Gluster cluster for Kubernetes provisioning, why not just go the next step and containerize Gluster itself? This an approach Red Hat, among other companies in the storage space, has branded “hyperconverged infrastructure.”
Gluster has its own FUSE-based mount command, included as part of the Gluster client installed on each storage node and on the server. Rather than bundle the unwieldy Gluster fat client into each container, the required volume for the container is mounted on the Kubernetes host, and then bind-mounted into the container. Inside the container, the user sees only the directory to the mounted file system.
Heketi keeps tracks of the state of the cluster through a database (which is why once a Gluster volume is set through Heketi, its information can’t be changed without Heketi, lest the tracking information will get unsynchronized).
Adams noted that Gluster containers are not typical containers in a number of ways. They are privileged containers in that they are using, and communicating with one another, across the host network, not the virtual network. They are also tied to specific nodes, a decision that Adam admitted, not everyone likes.
To ease the process of deploying a Gluster cluster through Kubernetes, a Red Hat team headed up a project called Gluster-Kubernetes. This software package can create a containerized Gluster cluster, by using a topology file describing the physical nodes. It sets up Heketi, and passes along the topology file to the newly created database (also containerized) so Heketi can then provision out the cluster.
The folks behind OverlayFS, another container-friendly file system. Vivek Goyal and Miklos Szeredi #LFVault #containers #Docker pic.twitter.com/LSEN1XKbx3
— Joab Jackson (@Joab_Jackson) March 24, 2017
Rook
A week later, at the KubeCon conference in Berlin, Quantum’s Tabbara demonstrated yet another Kubernetes-based storage control plane, this one based on the Ceph distributed file system, called Rook. “We wanted to create a software-defined storage cluster that could run really well in modern cloud-native environments,” Tabbara said.
Kubernetes Sets the Stage for Container-Native Storage
Tabbara points to a clean separation of concerns, with Kubernetes, which helps clarify the job roles of the app developer and storage administrator. The cluster admin deals with different hardware resources and should be able to set the policies of how these resources are used, independent of the application developers. The app admins should be able to easily create volumes and use storage, in a way that follows the policies of the storage admin.
In the demo, Tabbara took a bare Kubernetes cluster with no external storage, and just by using kubectl, deployed a Ceph cluster. He then created storage classes, and then deployed MySQL and WordPress pods provisioned volumes through Rook (One attendee called for Tabbara to kill the WordPress pod, which he eventually did, kicking off a process in Kubernetes to automatically restore it with a new copy).
Rook can work independently of Kubernetes, though the software does have a stand-alone version. It is pretty tied heavily to Ceph. “We think that Ceph is the Linux of storage,” Tabbara said.
Where Rook comes in is that “it is abstracting some of the complexity [out of] Ceph. The concepts that are exposed out of Rook are concepts that a cluster admin would understand, storage node, devices, SSDs. That abstraction requires a fair amount of work.”
Rook relies on Kubernetes to figure out where which parts of Ceph run and deals with resource management. Ceph itself has smarts around keeping the data safe and replicated.
QuarterMaster
Kubernetes is very much like an OS for a data center. Just like you don’t worry which CPU core runs an application on your computer, thanks to the OS, so too shouldn’t you worry about which nodes are running your distributed computing job, which is the role of the Kubernetes, argued CoreOS engineer Luis Pabón who gave a talk about the container storage technology at the Vault conference.
“This is a new way of thinking, where Kubernetes is in charge of the entire cluster,” Pabón admitted. “As the container moves from one node to another, Kubernetes makes sure the connection [to the storage] follows the container,” he said.
Still, even with all the benefits forthcoming, stuffing a file system into a container so it can be run by Kubernetes requires a lot of fiddling and knob-tuning, even with tools like Heketi, Pabón said. With this thinking in mind, Pabón created Quartermaster, a framework for deploying storage systems into Kubernetes.
Quartermaster is powered by a new feature that CoreOS help contribute to Kubernetes, called Operators. An Operator is a container that is run on the system that understands how to deploy and manage an application. CoreOS has already created a Prometheus Operator and an etcd Operator. Now, the company wants to standardize the model of deploying storage in Kubernetes with Quartermaster.
Quartermaster is a container that listens in to Kubernetes for calls to user-defined Kubernetes storage objects. In effect, the administrator created a storage cluster defining what attributes of a cluster they want and Quartermaster carries out all the requested to make it happen.
Since the attributes are defined in YAML/JSON, they can be provided not only by an administrator but programmatically through an API.
“By simplifying the deployment of storage systems, Quartermaster makes it possible to easily and reliably deploy, upgrade, and get the status of the desired storage system in a Kubernetes cluster,” The GitHub page reads. “Once deployed, a Quartermaster managed storage system could be used to fulfill persistent volume claim requests. Quartermaster can also be used to help the setup and testing of Persistent Volumes provided by containerized storage systems deployed in Kubernetes.”
While both Heketi and Rook each have their own deployment patterns, Quartermaster is designed to make such deployments uniform across different filesystems. Initially, Quartermaster supports Gluster, but over time, Pabón hopes to extend this to Ceph, NFS-Ganesha, Rook and others.
The AFS global file system could solve the storage woes of containers, with a mod to the Linux Kernel-Jeffrey Altman, @AuriStor #LFVault pic.twitter.com/ZY2dYhKemj
— Joab Jackson (@Joab_Jackson) March 24, 2017