Building Trust Among Teams with Cloud Native Data Protection
Enterprises that depend upon software innovation and massive scalability for success are inevitably among the most advanced adopters of cloud native computing infrastructure like Kubernetes. These same firms are also often considered to be the “best in class” at adopting DevOps practices.
Kubernetes and DevOps go hand in hand at these leading-edge organizations for three common reasons:
- They need to increasingly release new application features and remediate more issues faster without productivity constraints.
- They must balance a risk-taking, agile, collaborative team mindset with risk-averse security and data protection practices.
- They relentlessly pursue automation to maintain the rate of innovation and customer growth they are committed to.
A core tenet of DevOps is empathy. In strong engineering organizations, teams already have a culture of continually improving performance. But when team members truly strive to understand each other’s needs and trust each other to act in the best interest of the customer, the total impact of the team becomes much more than the sum of its parts.
Data protection solutions needed to support DevOps and Kubernetes environments include secure archival, backup and restore, and disaster recovery — functions that were once considered the domain of data center operators. With software moving to a distributed environment of public and private clouds, a new host of potential threats including service interruptions and ransomware bring data protection into the realm of developers as well.
DevOps practices and Kubernetes orchestration are pushing data protection left in the delivery life cycle. Through this evolving process, data protection allows trust to be earned among innovative product design, engineering and ops teams.
Recovering Forward Progress
Application delivery teams are migrating existing servers and VMs into containers and microservices to release new features and fixes faster for customers. Kubernetes abstracts all the details of physical infrastructure, making application workloads highly portable and stateless.
Developers no longer need to worry about IP name conflicts or changing network and firewall settings when all of that gets configured as code with each release. Great care must be taken at the persistence layer of these applications to ensure that customer data is neither exposed or corrupted. But who is backing up the release process itself?
With multiple DevOps teams auto-deploying and autoscaling apps within cloud and on-premises clusters, traditional version control systems and repositories can’t keep up with rapid changes. Releases and versions can get out of sync with underlying data structures, and misconfigurations appear within infrastructure code.
No dev team likes the prospect of losing their work or having to backtrack and refactor everything based on missing data.
Take a distributed AI development team working to fold complex proteins for a healthcare research project in a multicluster environment. As different teams sequence different parts of the model based on extensive machine learning data, one of the ML training teams has a service interruption. By some fluke of configuration, their cluster gets discarded — and losing one part of the model invalidates other project teams’ work up and down the sequence.
Snapshots are needed at regular intervals and every change time. Time and sequence matter so the team can recover the ML training data itself in its correct order prior to the failure, while also reconstituting the exact configuration state of the Kubernetes instances in deployment at the time.
Balancing Trust and Control Tradeoffs
Like a team trust exercise, how can an organization trust peer DevOps teams to catch them when they fall? IT executives and organizations may feel a high degree of uncertainty during the move from centrally managed data center backups to a highly distributed cloud native data protection posture.
Despite the perceived statelessness of Kubernetes, the cloud native applications it orchestrates are anything but stateless. While an ephemeral microservice may perform a workload without the equivalent of local storage or memory on a machine, it could be receiving data from Kafka and Redis, or writing to an instance of Cassandra or Postgres on either end of its brief existence.
Enterprises need to trust cloud native developers to understand what persistence data they need to be assured within their own application’s perspective. Get out of their way and offer them self-service platforms for provisioning their own environments, including data protection.
But self-service doesn’t mean do-it-yourself by any means. The global organization should set up agreed-upon key performance indicators and service-level objectives for data integrity, backup and disaster recovery service levels.
Shared policies give developers an abstracted, prepackaged way to add a data protection posture that can be ready to execute on Day 2 and beyond.
Rather than coding custom routines, developers can call on a shared platform populated by an approved library of policies, with compliance regimes and controls for backup intervals, redundant architectures and record immutability, making the overall release velocity of the organization faster, while making delivered applications more resilient.
Pre-Crash Application-Centric Automation
Cloud native environments contain so many inputs and outputs to every changing microservice, that a data-centric view of data protection would be far too granular for most organizations to implement successfully, as protection problems take longer to discover amid a swarm of incoming data.
Application-centric automation is required to carry out the data protection requirements of the organization before anything crashes!
To increase its globally distributed development team’s release velocity, Zenseact, an automotive technology company developing self-driving and assisted-driving AI solutions, planned to move hundreds of VMs supporting more than 30 of its critical visual analysis and inferencing applications to cloud native containers by 2023.
The company leaned on the Kasten K10, a platform that auto-scans their Kubernetes instances from an application-centric point of view, discovering discrete applications, which are captured with accompanying environment infrastructure, persistent data, storage resources and development artifacts.
Upon setting up a project, developers also get role-based access control authorization within Kasten K10 to track the data protection posture of all containers within their project, which automatically inherits the company’s global policies for backing up innovations.
Developers can customize their own retention policies in the platform from there, deciding which data is more or less important to frequently save, or when to make quick recoveries if a build fails or a data service crashes for some reason. But more importantly for productivity, developers don’t need to spend time writing their own backup and recovery routines.
“Kasten K10 empowers our developers with the freedom to perform any infrastructure task in an intuitive and safe way. They can decide for themselves if data is critical and how often to create backups. Our team is lean, so it’s important that we have smart automations. Standardizing on customizations and providing self-service through automation saves us a lot of time and accelerates development,” said Johan Jansson, Zenseact scrum master and service owner.
The Intellyx Take
Developers don’t deliver value to the business by simply checking in more code. Neither do IT teams reduce operational risk by simply backing up more data.
An application-centric approach means the Kubernetes application delivered on Day 1 — along with all of its cloud native infrastructure code, services and data — is pulled together as a whole for data protection and disaster recovery purposes in production, on Day 2 and beyond.
A truly agile and collaborative DevOps team is incentivized to deliver great applications that continually scale and improve to meet customer needs, while standing the test of time and remaining resilient despite malicious threats and volatile production conditions.