A Guide to Running Stateful Applications in Kubernetes
Kubernetes is an open source orchestration platform designed for deploying, managing and automating containerized applications. In the early years of Kubernetes, the platform primarily supported and ran stateless applications, defined as applications that do not need to retain data from one session to the next when containers are restarted. This use case created the misconception that Kubernetes is suitable only for stateless applications. But with the growing Kubernetes adoption and community contribution, it has evolved, of course, as one of the leading platforms for every type of application, including the deployment and management of scalable and highly available stateful applications.
However, even with full orchestration support for stateful applications, there are various gaps — especially storage-related gaps — in Kubernetes intentionally left for vendors to fill. The growing use of Kubernetes for stateful applications is pushing the advancement of cloud native storage solutions that can deliver the availability and resiliency that stateful applications require.
Kubernetes Storage Evolution
Initially, Kubernetes provided basic support for statically attaching volumes to pods using shared storage drives such as NFS and ISCSI for specific cloud providers like GKE and AWS. But adding any new storage solution to Kubernetes required checking code into the core Kubernetes code repository, making storage a complicated process.
Diamanti contributed the FlexVolume plugin to Kubernetes, which opened up a new era of volume provisioning in Kubernetes. It enabled storage vendors to create custom storage plugins without adding them to the Kubernetes repository. This, in turn, provided Kubernetes users flexibility to choose from different storage solutions with better management and data services. Over the years, the Kubernetes community introduced several rich storage features, such as:
- Persistent Volumes (PV), which provides storage resources for objects in the cluster.
- PersistentVolumeClaim (PVC), which is a request to use a PV.
- Volume Dynamic Provisioning allows on-demand creation of data volumes.
- StatefulSets allow easy scalability and management of stateful applications.
The FlexVolume plugin further evolved and paved the way for a more advanced storage plugin called Container Storage Interface (CSI). This plugin standardizes Kubernetes integrations with any third-party storage solution and provides support for data services natively within Kubernetes.
Considerations for Running Stateful Kubernetes Applications in Production
Even with these community-driven improvements, organizations still face storage-related challenges when working with stateful applications in Kubernetes. As organizations adopt containerized databases and other applications that contain important intellectual property, they realize the need to provide a more advanced set of storage features. Some of these include:
- Easy-to-use persistent storage
Kubernetes provides full support for persistent volumes and stateful applications. However, companies must still perform the actual provisioning of volumes using third-party CSI plugin providers. For example, most of the public clouds integrate Kubernetes with their existing storage architecture to provide this facility to users, but those architectures are usually limited in choices and performance. When it comes to the private cloud, there are many third-party options available, but it’s difficult to find a storage solution that offers a strong feature set, provides optimal performance, and is cost-effective. Most importantly, many of the existing storage solutions in the market have been designed to support virtualized workloads, yet running containers on bare metal has proven to be more scalable and performant and reduces unnecessary layers of abstraction. It’s very hard to find a good storage solution that works with bare-metal containers.
- High availability options
The nature of stateful applications requires more thought to how applications can recover from different failure modes, but not all third-party Kubernetes storage solutions are highly available. To overcome this problem many vendors provide shared storage, but those solutions often lack in performance and manageability. Storage availability services are usually measured with Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RTO measures the amount of time it takes to recover the data and RPO measures how far back the data will be recovered. The following table demonstrates various data services that are essential for stateful applications.
|Backup||Hours to days||Hours to days||Tape backup is one of the oldest ways to backup data. Even though tape backup is reliable, it might take days to backup and recover the data, especially if the tape backup is stored offsite. Backups to the cloud, hard drive, or optical drive are more commonplace today but still require a full recovery of a workload which can take several hours (or days).|
|Snapshots||Minutes to hours||Minutes to hours||Snapshots can be taken more frequently than tape backup, resulting in lower RPO. Recovery time from a snapshot depends on the underlying storage architecture and whether data copy is involved during the restore process. Also as a snapshot is usually local, it is good for rewinding the data but not for keeping it safe.|
|Replication||Seconds to minutes||Minutes||Replication (asynchronous) happens either at the volume level or application level. It can help to recover data with RPO and RTO in minutes or even seconds.|
|Mirroring||0||0||Mirroring is synchronous volume replication across storage devices which can exist either in the same cluster or in different availability zones.|
|DR Replication||Minutes to hours||Minutes||Disaster Recovery (DR) volume replication helps to asynchronously create a clone of volume at the DR cluster to protect against primary site failure and easily move the application to a DR site for quick failover.|
- Hybrid cloud data portability
In today’s cloud native world, organizations are adopting a hybrid cloud approach to combine the benefits of public and on-premises clouds. While it’s easy to migrate your stateless applications across multiple clouds, it’s very hard to migrate your data from one cloud to other clouds.
Most of the storage solutions on the market simply lack the performance needed for today’s data-heavy applications, making storage devices a major bottleneck to serving applications. The advent of new storage technologies like NVMe (Non-Volatile Memory Express) with solid-state drives means storage doesn’t have to be the bottleneck anymore. As a result, it’s important to consider worst-case latencies in addition to throughput when selecting a storage vendor.
- Quality of Service (QoS) guarantees
Kubernetes reserves CPU and memory for the containers themselves. However, Kubernetes does not support storage or networking bandwidth reservation, which is still a concern and causes the noisy neighbor problem. To guard your critical stateful application against this shortcoming it is important to have a QoS level guarantee for storage resources.
- Shared network for storage and data
Most of the traditional storage solutions rely on the host networks for storage data traffic. This data traffic competes with the regular network traffic and imposes a security risk. Look for solutions that isolate storage and container traffic.
- Security /encryption
It’s important to consider the security and encryption aspects of storage solutions. Most enterprises require security protocols, like self-encrypting disks, volume-level encryption, and key management, among others to protect themselves against data loss and security breaches.
A Unique Approach to Storage
A cloud native storage environment requires the best of all these worlds:
- Simple to use
- Highly available
- Hybrid-cloud ready
- High performance
- Consistent performance
And it also needs to be cost-effective and built for Kubernetes.
Diamanti is a bare metal, hyper-converged Kubernetes platform that brings everything you need to run containers, including software-optimized NVMe-based SSD storage, under a single umbrella. Diamanti pioneered the storage capabilities for Kubernetes with the FlexVolume plugin and has been continuing to innovate in this area with support for backup, DR, snapshot and mirroring. The low-latency architecture supports stretched clustering across nodes, zones, or even data centers. Its patented I/O offload architecture also helps to deliver best-in-class performance, achieving 10x to 30x improvement over standard server or HCI environments. Finally, Diamanti also brings the true hybrid cloud capability allowing migrations of stateful applications across the cloud.
Kubernetes has evolved to become the best platform to orchestrate stateful applications. You can easily manage and scale the stateful application with Kubernetes constructs, such as StatefulSets and persistent volumes. But still, it’s not enough to utilize the full potential of Kubernetes without an underlying storage infrastructure. When deciding upon a third-party storage vendor, keep these challenges in mind, and look for vendors that provide a solution to the issues most critical to your application deployment.