Kasten: Data Management for Kubernetes
Like many startups, Kubernetes data management vendor Kasten grew out of pain.
Its founders, Niraj Tolia and Vaibhav Kamra, had joined EMC from the acquisition of Maginatics in 2014. Still a product within EMC, Maginatics provides a secure global namespace across storage platforms that makes content accessible from any device or location.
As Tolia tells it, moving the control plane of the product he was responsible for at EMC into Google Cloud and running on Kubernetes in 2015 was difficult — and set developers off complaining.
“We were discovering integration issues much earlier in the lifecycle of product development that what we’ve traditionally seen. This is five years ago, so that’s when the lightbulb went off in my head that this is going to be the future,” he said.
“At EMC, we were part of the data protection group building out data protection products. We didn’t have anything to save about us. We had a hand script stuff and that was just not scalable or feasible as things grew for us.”
So Kasten was founded in 2017, its name coming from the German word for box or container.
In a blog post, Tolia maintains that traditional data management platforms just can’t keep up with microservice-based systems.
“There is so much that these traditional products cannot handle because the literally is 100x explosion in the number of components you need to capture,” he said.
“There’s frequent rescheduling. A lot of these things are locked down. … And in fact, one of the legacies very large legacy vendors proposal is, poke holes in your firewall … open up your network policies to allow us to get inside your cluster. And that’s the only way we can back up.
“So it doesn’t really jibe with the kind of requirements there are, especially as apps are constantly changing. There’s no VM-to-application mapping anymore, so they can’t figure out what’s happening. All of those combined from an infrastructure point of view makes these solutions hard.”
Kasten’s K10 platform uses Kubernetes to auto-discover containerized apps, their components, startup processes and policies.
By using Kubernetes Container Storage Interface (CSI) abstractions, it doesn’t need to understand specific array interfaces to provide data protection, migration and disaster recovery for block and object storage systems.
As a snapshot of a container’s entire state, it can be moved between systems or used in a remote site for disaster recovery.
“All your configuration, your secrets, your config, your container images, your service accounts and networking information. We gather all of that for the customer, so that it is independent than off the infrastructure. You can take this to another cluster, another region, another cloud, hybrid environment across prod and test, dev,” Tolia said.
It also has addressed things that are not portable across environments. Say your TLS certificates are from Let’s Encrypt, you can allow the system to regenerate certificates on a restore.
“We ourselves are a cloud native application. We are what sometimes people will refer to as a critical cluster, where we deploy as an application and then behave as infrastructure. So we’ll install in our own namespace, a project within Kubernetes. And then we use a service account to hook into Kubernetes, and be able to orchestrate and discover applications running on the platform. We also have hooks into the underlying physical infrastructure,” Tolia said.
If you need to synchronize across 10 services, you can use an agentless extension mechanism that it calls blueprints.
K10 was built from the ground up to balance the needs between ops and dev, Tolia said.
“That is, how do we simplify compliance management for the ops team policymakers? Automation, people are generally running multiple clusters. How do you give them global visibility, alerting, monitoring?
“On the developer side, … developers know we’re there, but we’ve made zero changes to the application or the deployment pipeline. But if they want hooks, they can extend what data management means for them. But then they say, ‘No, I don’t want you to back up this way. I wanted a slightly different way.’ So all of that also works in our environment.”
Sopra Steria, a multibillion-dollar UK-based IT consultancy, recently used Kasten as it moved nearly 200 applications from OpenShift 3.11 to OpenShift 4.3.
Kasten does not see the backup of your application as just a snapshot of your data, but a snapshot of your data with all the configuration at the same time — all the Kubernetes resources within the namespace. This constitutes an atomic unit that Kasten calls a restore point, which is a dependency tree.
It enables you to make transformations across infrastructure, for instance, you can include within the migration process where DNS names need to change.
You start with a namespace, then launch a backup action, which is going to create a restore point. Then you create an export action, which is going to export the restore point, in this case to S3. You do all that with a backup policy, and you can do it weekly, daily or hourly.
Then you create an import policy on the destination cluster. The import action creates an import restore point, which creates a restore action, which recreates the namespace with the data and configuration.
In the middle, you can introduce blueprints, which allow you to orchestrate the backup. The blueprint can handle the preparation necessary for the backup. You can use blueprint transform when restoring the application, which enables you to add some changes during the restoration.
It is storage agnostic — it is not a storage system itself, but integrates with a variety of databases.
It uses metadata to automate migrations of workloads and data transfers between Kubernetes clusters and to ensure that backups have been successfully completed.
Kasten most recently added what it calls the Cloud Native Transformation Framework as part of its Application Transformation Engine. It captures both underlying data and additional metadata, including data transfers, lock-free algorithms, pluggable encryption and compression, advanced deduplication, and smaller fault domains to improve backup efficiency and reliability.
K10 runs on-premise or in public clouds. It is designed to scale to the workload, which can reduce the overall size of its IT infrastructure footprint.
Tolia wrote about leveraging Kubernetes’ CSI in a post for The New Stack.