Kubernetes Apps Are Snowflakes: Find an Extensible Protection Framework

As developers and application teams choose containers and Kubernetes as their preferred technologies for building, deploying, running, and scaling applications, more applications with data land on Kubernetes every day. Increasingly, independent software vendors (ISVs) are distributing their applications packaged as containers to be run on Kubernetes, proliferating their usage in all industry verticals. Many such applications are business-critical, whose outage can cause serious revenue, productivity, and reputation loss.
As enterprises realize the business need for protecting their Kubernetes applications, they turn to do-it-yourself open source or commercial solutions available in the market. Protecting a Kubernetes application usually entails the ability to recover an application after a disaster, cyberattack, malicious or unintended user error:
- Within the same cluster
- A different cluster in the same region
- A separate cluster in region/geography
Applying these solutions to protect and move Kubernetes applications often does not work out of the box. The flexibility and extensibility of Kubernetes (using custom resource definitions), which makes it so popular with developers, makes it harder to protect Kubernetes applications because there is no standard way to determine what needs to be backed up to enable a successful recovery after a disaster.
Real-world Kubernetes Applications Are Snowflakes
Not having a standard Kubernetes specification that defines what comprises an application makes the problem of identifying what resources to backup and clone particularly challenging. Consequently, solutions that focus on Kubernetes application protection — providing backup/restore and disaster recovery (DR) — make assumptions about how developers will architect their applications and try to do their best in figuring out what constitutes an actual application — and how best to protect it.
This approach leads to automating the process of Kubernetes application discovery. Automating the application discovery process is desirable in Kubernetes environments that can be highly dynamic. However, automatic discovery usually works for only a subset of cases where applications are simple and confined to a namespace. If applications span multiple namespaces, have cluster-scoped resources, and/or use external fully managed databases, data stores, or messaging services — the simplistic application discovery and consequent application protection functionality offered is not adequate for protecting most Kubernetes workloads.
What Constitutes a Kubernetes Application?
Most real-world Kubernetes applications don’t conform to a standard recipe that developers follow to develop and deploy a Kubernetes application. How developers define a Kubernetes application can be a combination of the following (not an exhaustive list):
- App = 1..N resources in 1 namespace
- App = 1..N namespaces
- App = 1..N namespaces + 1..N cluster scoped resources
- App = 1..N resources in 1..N namespaces + 1..N cluster scoped resources
- App = 1..N resources in 1..N namespaces + 1..N cluster scoped resources + external resources (for example, a fully-managed database in a public cloud)
As is evident from the above, using namespaces as an application separator does not cover all the different ways developers define their Kubernetes applications. Automatically inferring all the components that comprise a Kubernetes application can also lead to misses in identifying Kubernetes applications’ resources.
Custom Application Definition with User Input
To effectively backup and restore an application, users must be able to specify what constitutes their Kubernetes application that provides the critical “service” whose business continuity must be ensured. Ideally, users should be given an opportunity to either define their application using Kubernetes native mechanisms or be allowed to define a custom application that conforms with the application definition patterns enumerated above.
In some cases, it is equally important to identify resources that should be excluded from the app definition as they may not be relevant in the target/destination cluster or namespace, preventing a successful restore. An effective Kubernetes application protection solution must either offer users the ability to specify these details or intelligently facilitate discovering such complex applications. Once an application is defined with all the resources that are necessary, protecting an application is no longer guesswork, leading to much more predictable results.
Hooks, Hooks, and More Hooks
Hooks or pre-scripts/post-scripts have been used by enterprise backup software for ages enabling users to insert custom actions to take application consistent backups like quiescing a database before taking a snapshot to flush in-memory tables to disk. The ability to insert custom actions with “execution hooks” has taken a whole new meaning with Kubernetes applications. Predictably controlling the end-to-end backup and recovery of a Kubernetes application to enable a successful application instantiation after recovery often requires custom actions that need to be performed on several resources that comprise a Kubernetes application before and after a backup as well as before and after a restore.
Execution hooks in Kubernetes during a backup-recovery workflow can be used to accomplish many objectives like applying network security policies dynamically before or after backup/restore, altering the registered path for an ingress controller before a clone, refresh IP addresses, and other custom application-specific actions. Kubernetes application protection solutions must allow the extensibility of backup/DR workflows with execution hooks using which custom application-specific actions can be inserted.
What Not to Backup and Restore
Identifying what application resources need to be protected is critical, but what not to backup and/or restore is equally essential. This is because certain resources like secrets, IP addresses, and node ports may not be relevant in a target cluster for disaster recovery. A bulk backup of all resources that comprise a Kubernetes application followed by a subsequent restore can cause a faulty restore because a subset of resources is simply invalid in the target cluster. Conceptually, there are usually two ways to address this situation.
- Use filters to exclude backing up or restoring specific resources that do not make sense in the target cluster enabling Kubernetes or operators (popular design pattern for deploying and managing Kubernetes applications) to recreate these resources.
- Transform resources before they land on the target cluster.
Kubernetes application protection solutions must allow users to do all the above to ensure correct application behavior after a restore.
Summary
As enterprises realize the need for protecting their Kubernetes applications, it’s important to evaluate these solutions for their customizability and extensibility before adopting them at scale. It’s also important to provide a set of best practices, which guides developers to stick to a recipe while architecting their Kubernetes applications including recommending a standard set of in-cluster and external services that they leverage so that it’s easier to protect them as business-critical Kubernetes applications become ubiquitous in the enterprise.