Modal Title
Kubernetes / Storage

Kubernetes, Data Portability, and the Rise of Portable Stateful Applications

A primer on getting Kubernetes to work with stateful applications, using the container storage interface.
Aug 20th, 2019 8:58am by
Featued image for: Kubernetes, Data Portability, and the Rise of Portable Stateful Applications

Niraj Tolia
Niraj Tolia is the CEO and Co-Founder at Kasten and is interested in all things Kubernetes. He has played multiple roles in the past, including the Senior Director of Engineering for Dell EMC's CloudBoost family of products and the VP of Engineering and Chief Architect at Maginatics (acquired by EMC). Niraj received his PhD, MS, and BS in Computer Engineering from Carnegie Mellon University.

One of the greatest benefits of using Kubernetes is due to its proliferation.  Because of its functionality and popularity, every major cloud provider has a managed Kubernetes offering, and, for private data centers, there are a variety of on-premises solutions that enterprises can choose from. This availability of common infrastructure across diverse operating environments is both unprecedented and empowering for all application developers. One of the greatest benefits that has resulted for this ubiquity is that of application portability. For users of Kubernetes, it has now become easy to deploy the same application everywhere.

In fact, developers are already leveraging the portability benefits of stateless applications but two large hurdles remain to achieve the same portability for stateful applications: storage infrastructure abstraction and portability of the actual data (or state). In this article, we will explain that, when designed correctly, these same portability benefits can also apply to stateful applications including relational databases and NoSQL systems.

Leveraging Storage Interface Portability

For applications that don’t have data portability requirements, underlying storage infrastructure diversity can be an issue. Storage diversity can be abstracted away, however, through the use of Storage Classes and the Container Storage Interface (CSI).

In particular, the infrastructure-specific storage configuration can be abstracted away within a StorageClass and, as long as a StorageClass with the requested name is present, the application will use the administrator-provided mapping for the selected storage infrastructure. Common examples of storage configuration include defining the underlying storage performance type (e.g., SSD vs. spinning disks), QoS features such as IOPS requirements, and at-rest encryption options.

CSI, used to add block and file storage to containerized applications, similarly abstracts away the management interface of different storage providers by providing a common cross-platform API. Some of the useful common CSI storage abstracts provided by almost all storage vendors include all lifecycle operations such as volume creation, deletion, and mounting as well as the ability to take and restore volume snapshots.

When used together, these two abstractions will allow developers to build and deploy applications on heterogeneous storage infrastructure without needing to know about either the storage vendor or the underlying storage architecture.

This API portability approach is what we have discovered works best for a majority of Kubernetes users because it follows the KISS principle that states that most systems work best if they are kept simple rather than made complicated. In particular, using API abstractions has a number of advantages when compared to adopting storage “overlay” solutions that introduce a new storage layer on top of whatever might be present in the deployment environment:

  • Best of Breed: Use the native storage technology (e.g., EBS in AWS or NetApp on-premises) that will always be the best suited for your deployment environment. This allows your applications to leverage the underlying storage provider’s optimized performance, reliability, and deep hardware integrations instead of a multi-platform storage overlay that could be limited to lowest common denominator APIs.
  • Performance and Cost Benefits: Directly allowing applications to use the deployment infrastructure’s native storage stack also delivers cost and performance benefits that accrue over time. In contrast, the increased overhead of layering one storage system on another (e.g., a storage overlay over EBS) results in reduced performance while suffering from the additional management overhead and budgetary costs of running two separate storage layers.

Delivering Data Portability

While the use of the above storage abstractions can deliver deployment portability for your stateful applications, data portability support is also required for improved resiliency, disaster recovery and even test/dev workflows. In fact, what you really need is application portability, but that is a topic for another article — we will just focus on the data components here.

Given a stateful Kubernetes application, application data has to not only be backed up within cluster but also retain the ability to move the entire application stack and its data in multicluster, multiregion, and multicloud environments. This must be done for a number of reasons:

  • Disaster Recovery: Having a hot standby across failure domains (clusters, zones, regions)
  • Hybrid Setups: Ability to support data movement across hybrid environments given the on-premises footprint seen in enterprises
  • Avoiding Vendor Lock-In: Apart from cost benefits, retain the ability to not re-deploy on the infrastructure of choice
  • Test/Dev Workflows: Support copy data management workflows to frequently bring production data into test or CI environments

In our experience working with customers that needed data portability, we learned that the following characteristics were very important to them:

  • Transparency: No application changes should be required for this. Slowing developers down is just not a viable option.
  • API-Driven Automated Workflows: The workflow should be API-driven and completely automated. There should be no manual actions required.
  • Infrastructure Independence: Just like with storage interface portability described above, you should not be forced to use the same storage system everywhere to benefit from data portability. You should be able to pick the best storage system for your environment but be able to transparently move across infrastructure (e.g., from AWS EBS disks to Google Cloud GPD volumes).
  • Data Security: All data transfers must be secure, and orchestration APIs RBAC-enabled. Apart from in-flight and at-rest encryption, a solution should support data masking and policy-based GDPR compliance.
  • Environment Isolation: The ability to support data movement across different accounts, resource groups, and clusters so as to not impact the performance of your primary workload.
  • Performance and Efficiency: Given data gravity, the ability to use advanced deduplication and data transfer techniques for both cost and performance reasons.

Through the growth of Kubernetes, developers across industries are now able to create portable stateful applications for flexibility of deployment and upgrades, and improved application availability and agility. As you engage the myriad products that exist today to help develop, deploy and manage your Kubernetes applications, look for a data management solution that supports data portability while providing choice for your underlying storage options.

For a deeper look at delivering both interface and data portability, we put the below video together. It will show you how a complex application like GitLab can not only be captured within a single cloud environment (Google GKE) using interface portability abstractions but also be migrated to a completely different cloud (AWS EKS) provider by using data portability features.

Feature image via Pixabay.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.