The New Features in Google’s Anthos Are about Operationalizing Kubernetes
Google is bringing service mesh management, serverless and automated policy and security management to its Anthos hybrid Kubernetes platform. The idea is to make it easier to operationalize containers and microservices by taking advantage of Google’s experience managing and running Kubernetes, Google Cloud product management director Jennifer Lin told The New Stack.
“One of the complaints we often hear about open source is ‘I don’t want to have to build it and package it and deploy it myself, I want all of those day one operations to be really easy.’ So that’s what we focused on in the product initially.” But she says the new features like Anthos Service Mesh go beyond just providing infrastructure to getting a managed service on your own hardware that will deliver on Service Level Objectives.
“Now we have a console-based topology graph view, that allows you to do SLO service level objectives, SLO management. So now we’re really managing services, as opposed to worrying about how to set up the infrastructure and set up DNS. If you’re a developer, you want to understand the source code, and every everything you can tweak, whereas if you’re a user, you want someone else to give you the best practices.”
Kubernetes was initially popular with teams she calls “technically savvy” digital natives and while enterprises increasingly have at least some internal expertise in Kubernetes, she argues that they are never going to be as good at as Google at managing and scaling it.
“There is at least one and often several engineering teams who are already doing lifecycle management in a Kubernetes environment. But as they scale, they’re looking to have others manage that. They want an environment that just works and for the operational lifecycle management to be managed by SREs who understand that domain. At Google, we launch billions of containers a week, we have SREs that have to keep the stability and resiliency of those global clusters. What we’re trying to also really stress with both Istio and Kubernetes, it’s one thing to understand the components in the development stack, it’s another thing to actually operate it at scale, and make sure that we’re managing and accountable for an SLO to the customers.”
This is part of Google’s usual argument for Anthos; that Kubernetes is available across clouds (which why organizations that fear vendor lock-in believe is key) but that Google’s experience of running it gives it an advantage. “Very few of the people that talk about a Kubernetes distribution, actually have the day-to-day operational domain experience to actually operated at scale, including our competitors in the cloud,” she claims.
Google will also need to square the circle of presenting Anthos as a managed version of a ubiquitous infrastructure layer that avoids lock-in with the desire to see more adoption of PaaS as the “next phase” of cloud. “Many customers today have been using cloud almost like another server; they’re using raw compute and storage, but they’re very wary of tying into higher layer services,” Lin notes.
Unlike Microsoft’s Azure Stack, which brings a number of PaaS services on premise, Anthos is, so far, a solution for the infrastructure side, while Anthos Migrate is being used as a first step to containerize legacy applications, building on the technology from the Velostrata acquisition, which initially migrated VMs from other clouds into Google Cloud Platform (GCP).
Now, Lin says, thousands of workloads that are “primed to move” to Anthos include not just already containerized apps but also “VM workloads that don’t have a lot of other dependencies, that are easy to move into a managed environment.” The ability to take an existing VM and put it under the orchestration control of Kubernetes in Anthos is in beta and she calls it a “leapfrog path.”
“We have a lot of customers who have seen that as an opportunity to skip the lift and shift phase, and get an existing VM under management of the Kubernetes orchestration layer on day one. You’re not going to get all the benefits as if you wrote it as a container, but for many customers, this is extremely practical. We didn’t write it initially as a container, but we can take a VM, manage it as if it’s a container, and over time slim it down so we get all the benefits of the containerization,” Lin said.
Anthos has also proved interesting to software vendors who want to offer a product on Kubernetes “either to SaaSify their services, or go to market with a more modern on-prem environment and they’re asking us to also help them manage the ongoing lifecycle operations, Lin said. One time series database vendor who has already built their SaaS service on Kubernetes and is managing that wants to move to Anthos “so they can be in the Anthos marketplace and allow customers to click to deploy into an outsourced environment.”
The Google-run Anthos environment now offers more configuration management; the idea isn’t to be more self-service or to push users back to the Kubernetes API server, but to simplify working with multiple environments by offering declarative policy management as a managed service. “Say we have a declarative config for setting up identity and access management across the on-prem environment and GCP. In that scenario, the resource model is not based on GCP resources, but the policy is still the same, like ‘the finance team is not allowed to have access to this storage bucket because It contains this type of data.’”
The high-level intent of the policy doesn’t change, but obviously, the scripts that are required to configure that in my on-prem Kubernetes cluster versus Amazon Web Services or GCP is different. So what we’re talking about with the config and policy management is that second stage in terms of turning a YAML file into the Python scripts that actually configure the GCP resources versus the Azure resources. The way we do that is we sync the namespaces from the on-prem resource model to the GCP resource model. That’s an automated process, I don’t have to reconfigure or rewrite the policies for anything whether it’s stored on-premise, in a third party storage target or GCS, I still have the same policy that has to be enforced.”
That fits with the popular “GitOps” approach. “A lot of our customers are using GitHub to version control the configs that have already been blessed by the security team or the cloud platform team. Increasingly, as opposed to having an IT admin run CLI commands against a device or a virtual appliance, the trend is to have more of an automated system so as soon as I authenticate a new resource model or a new system component into the architecture, it has to comply with that declared policy.”
The new Binary Authorization capability is part of this managed, automated approach, with security checks to ensure you deploy only trusted workloads. “How do you arm each granular component of the architecture to ensure that anything that joins this environment is really what it says it is and that it has the attestations required to make sure that it can participate in the service mesh, or in the Kubernetes environment? So here we’re establishing trust for any image that comes into the managed environment: at deploy time, I can only put a trusted workload into the environment, and it will check that workload is assessed and validated before it goes into day two operations in a deployed environment.”
Bringing Cloud Run to Anthos is about getting control of where serverless workloads run; that’s a slight tension with the fundamental idea of serverless being that you don’t care about the infrastructure but Li points out that developers and administrators have a different set of needs here.
“I should, as a developer, be able to write a function or fire off an event trigger that doesn’t really care anything about the infrastructure. If I am, however, the cluster administrator, I want to be able to say, well, within this cluster environment, a developer can do whatever they want, but make sure it’s on this cluster in this region. We can take an existing container and make it a service endpoint in an existing Anthos environment within a couple of seconds. A developer worked on something cool, they want to expose it to the internet, or over an HTTPS session. And that endpoint could be defined as anything that the internet can reach — or it could be defined, as many of our enterprise customers are thinking about it, as a private endpoint. Don’t expose it to the public Internet, make sure that it works within my private network environment.”
When we asked about compatibility for Anthos and GCP services like Cloud Run, Li said that the goal was compatibility but noted that “there’s a little bit of lag in terms of the things that we’re doing on-premises.”
Anthos isn’t a way to get all the Google services for your data center either. “Everything that supported in the Kubernetes API server working on-prem is a reasonable expectation,” she said, but “We’re not saying everything that works in GCP suddenly works on-prem. Some of the other GCP services – we have no plans to put, for instance, Spanner on-prem.”
Feature image by Ionas Nicolae from Pixabay.