How to Make Up for Kubernetes’ Disaster Recovery Shortfalls
Before Kubernetes, companies had to be experts in infrastructure and maintain complex integrations to deliver applications consistently and reliably. Kubernetes has been shown to offer tremendous advantages for application development and deployment for many organizations. Kubernetes’ abstractions, for example, reduce complexity and operational burden while permitting application developers to deploy applications into the system in a declarative and repeatable way. When using self-monitoring and self-healing capabilities, Kubernetes can help to assure the application is always running in the optimal configuration. All this is accomplished on a consistent platform, whether on-premises, in the cloud or across hybrid infrastructure.
But the same dynamic behavior that provides so much advantage by decoupling application development from the infrastructure they run on becomes a significant challenge when dealing with stateful applications. And stateful systems account for most applications running in production. The external storage systems that support these types of applications are seldom portable and are frequently tied to a specific cloud provider. Kubernetes alleviates this problem by providing a wide breadth of volume options for storage integration. Kubernetes persistent volumes, for example, can be viewed as mounted storage that is supported by Amazon Elastic Block Store (EBS).
What Kubernetes does not provide, however, is a data protection or migration solution to enable data to be recovered and brought back online or moved to a different provider when disaster strikes.
A Case for Backups
In the cloud native space, one thing that does not change is the need to back up data and other critical components necessary to enable fast recovery or migration of an application. Here are some of the reasons to maintain running backups:
- Human errors.
- Hackers prey on vulnerable systems.
- Natural or human-made disasters.
- Companies lose customers when there are outages and data loss.
- Legal standards or compliance regulations often require data retention.
Also, it may be useful to have a strategy for migrating an application from one system to another. For example, a company might want to switch cloud providers for any of these reasons:
- Changes in terms and conditions.
- Differences in costs or benefits.
- Compliance with a regulatory standard.
Finally, periodic data archival might be necessary to retire old data from expensive primary storage while retaining it for compliance or future analytics.
What to Back Up in Kubernetes
Native objects that represent a cluster in a Kubernetes platform are stored in an etcd database. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all the master nodes. Etcd is also sometimes used to hold state for network plugins, CRDs and other essential components.
Besides the content of a Kubernetes cluster, some components exist outside of etcd that are necessary to run Kubernetes. These are critical items related to data storage and infrastructure that also need to be backed up for a full application migration or a protection and recovery plan:
- Persistent volumes.
- Certificate and key pairs, certificate authority.
- Service account signing.
- LDAP or other authentication details
- State associated with any CRDs and CNI plugins not using etcd.
- Network resources (configuration allowing recreation of DNS records, IP address and subnet assignment, switch, firewall, routing, load balancing, proxies, etc.).
- Cloud provider-specific account and configuration data.
- Credentials for the underlying infrastructure (access keys, tokens, passwords, etc.).
Backing up a Kubernetes Cluster
Companies that are reaping the benefits of a cloud native architecture need to take an additional step in enabling the continuous availability of their applications by using a backup, recovery and migration strategy for their Kubernetes clusters, data storage and infrastructure.
Velero, an open source project, integrates with Kubernetes clusters and data storage and provides a method of backing up and restoring native Kubernetes objects from its etcd database. It also does backup and restore for any application’s persistent data alongside its configurations. This can be done using the storage platform’s native snapshot capability, or by using the integrated file-level backup tool called restic.
Because Velero runs as a server process inside a Kubernetes cluster, it will work both on-premises and in public cloud environments. And because it uses the Kubernetes API to capture or restore the state of cluster resources, it provides the following benefits:
- Backups can capture subsets of the cluster’s resources, selecting them by namespace, resource type and/or label selector, which provides a high degree of flexibility around what’s backed up and restored.
- Users of managed Kubernetes offerings often do not have access to the underlying etcd database, so direct backups and restores of it are not possible.
- Resources exposed through aggregated API servers can easily be backed up and restored even if they’re stored in a separate etcd database.
For storing its backup files, Velero requires access to any S3 or S3-compatible data storage or any object store. This storage does not need to be in the same provider as the Kubernetes cluster being backed up.
Velero Plugin System
Velero has offered support for performing persistent volume snapshots since its inception. And to enable support beyond a handful of providers that the team can maintain, Velero provides a plugin API that lets the community create plugins for any provider if it does not already exist. The Velero team maintains object-store plugins for Amazon S3, Microsoft Azure Blob Storage, Google Cloud Platform (GCP) Storage and volume snapshot plugins for Amazon EBS, Microsoft Azure Managed Disks and GCP Compute Engine Disks.
Here are some key features provided by the Velero command-line interface (CLI):
- Create, delete and list all backups.
- Schedule a backup.
- Download the backup file locally.
- Describe and inspect a backup like any other Kubernetes object.
- View logs on a per-backup basis.
- Trigger a bug report directly from the CLI.
Because all Velero operations are done using the CLI, DevOps teams and platform operators can trigger ad hoc backups, configure scheduled backups and perform restores quickly and easily. And because the project is open source, any organization — large or small — can start creating backups of their Kubernetes clusters at any point in their cloud native journey with minimal effort and cost.
Different Times, Same Needs
In the context of business, a disaster is any event that disrupts the continuity of a business. When it comes to Kubernetes, it is important to understand how far the platform goes to enable this continuity and where it needs to be complemented.
Because disaster recovery is not a set of operations Kubernetes provides, companies need to find a backup and recovery solution that will work for their needs. At the same time, tools like Velero, which integrates directly with the Kubernetes API, will offer the most flexibility and streamlined workflow. Because it is an open source project and simple to use, companies have the option to implement a backup and recovery workflow from the moment they shift their operations to Kubernetes. Having backups never ceased to be crucial and with Kubernetes and Velero, it can be simple and affordable.