3 Reasons We Need Data Protection in Kubernetes

To gain an appreciation for how increasingly important it is to protect Kubernetes applications, it’s instructive to compare container environments today with the way they looked when Kubernetes started out just eight years ago.

Back then, data protection in container environments was an afterthought. Containers were envisioned as stateless, lightweight constructs that could be spun up quickly to launch applications. Since these simplistic applications had no data dependencies and could be terminated or restarted without any appreciable side effects, enterprises didn’t put a high priority on protection strategies.
That changed as Kubernetes evolved into a much more ubiquitous enterprise-ready project. With users running hundreds of nodes inside clusters, it soon became clear that any significant Kubernetes application that addresses business functions will require data to persist beyond the initial container launch (customer’s shopping cart, bank transactions, etc.).
Kubernetes evolved quickly to add these capabilities to handle state, including constructs like StatefulSets, Container Storage Interface (CSI), etc. In other words, as Kubernetes matured, applications became stateful, with databases as one of the most popular workloads today. This evolution has led to data-protection initiatives like backup and disaster recovery as an imperative and a priority in organizations.
Let’s dive into some of the drivers behind Kubernetes native data protection:
- The rise of cloud native applications.
- The proliferation of stateful applications.
- The changing roles and scopes in IT.
Cloud Native Applications
While architectures (servers, virtual server, containers) have evolved and become more dynamic and distributed, the core requirement for protecting data has remained an imperative.
With cloud native applications in a Kubernetes operating environment, the underlying application architecture is completely different from hypervisor-based environments. Hence a new, Kubernetes native data-protection approach is needed. A few examples that highlight the changes include that with Kubernetes, pods are constantly being rescheduled to different physical nodes, so using the virtual machine as the unit being backed does not work. Additionally, with Kubernetes there is an order of magnitude increase in the number of metadata objects (secrets, Configmaps, etc.) that need to be backed up in addition to the storage volume data, making hypervisor-based backups unsuitable.
As a result, a Kubernetes native solution that uses cloud native applications as the unit of atomicity for backup and recovery operations should be the objective of every organization that is looking to modernize its infrastructure and applications.
Stateful Applications
While the origins of Kubernetes-based applications were simplistic, ephemeral workloads that did not contain state, much has changed since. Applications that solve serious business functions need state. It was not optimal from a development or an operations perspective to run your stateless constructs in a Kubernetes environment and stateful database in a legacy environment.
So Kubernetes itself evolved to include constructs that allow cloud native applications to contain state that persists across individual pods. These constructs included the introduction of StatefulSets in 2017, which enabled handling distributed database clusters in a highly available environment. Operator frameworks started gaining popularity in 2018, enabling applications to control their life cycle operations and define dependencies of individual microservices, including ones that contain state. In that same year, Container Storage Interfaces (CSI) were made generally available to enable storage vendors to expose standard block and file interfaces to applications. In 2020, volume snapshots became a part of the Kubernetes v1.20 release, allowing you to restore or clone data from a previous snapshot. And there are many more capabilities that have since been added and are currently being worked on to make stateful application a snap to work with in your favorite Kubernetes environment.
The net result of all these advancements is that databases are among the most popular workloads on Kubernetes today. Redis, Postgres, MySQL, etc., are all examples of some of the top technologies running on containers. This has brought immense productivity gains and simplified operations. However, it makes it even more compelling to ensure your environment is protected with Kubernetes native backup and DR tools that are simple to operate.
Changing Roles
One way Kubernetes is making application development and delivery faster and better is by bridging the gap between infrastructure and application teams. Infrastructure teams are typically responsible for building and delivering the tools that manage a secure cloud native infrastructure — let’s call them the providers. Application teams are the consumers of these tools and are focused on building business applications.
Kubernetes allows the infrastructure teams to create flexible environments that can span deployments across on premises and clouds. These environments can be augmented with a platform that provides common capabilities like security, backup and DR that protects applications introduced as a part of a Kubernetes cluster. Application teams, on the other hand, do not need to open service tickets and wait for a long drawn-out process to perform functions like data recovery or rollbacks. Instead, they can leverage self-service capabilities to perform these functions if they have been authenticated and authorized to do so.
This is where Kubernetes native role-based access control (RBAC) comes in. A Kubernetes native data-protection tool is cognizant of these RBAC constructs and can ensure that application teams can access and gain visibility and operations only to the applications and namespaces that their Kubernetes administrator has configured. This, coupled with container-optimized operating systems like Bottlerocket or Red Hat Enterprise Linux, ensure that the attack surface is contained while maintaining the agility of operations and separation of concerns.
Conclusion
As organizations embrace Kubernetes as their operating environment, data-protection initiatives like backup and disaster recovery have become an imperative and a priority. This will require choosing the right Kubernetes native data-protection tool that provides both the infrastructure and application teams the ability to innovate at DevOps speed while ensuring that cloud native applications can scale and operate smoothly.