Google Anthos from the Eyes of a Kubernetes Developer

The Google Cloud Platform formally launched Anthos at Cloud Next 2019. Anthos is one of the rare technologies from Google to attain general availability (GA) status in such a short time. Despite becoming generally available, Anthos was not a self-service enabled GCP offering. It took almost a year for it to show up in the GCP console. Unlike other GCP services, the documentation was sparse and incomplete.
Apart from the obvious fact that Anthos is based on the Google Kubernetes Engine (GKE), the community didn’t know much about the platform. But things have started to change significantly in 2020. Anthos is now available in the Google Cloud Console. The documentation provides detailed information about the platform architecture, installation, and configuration of various components.
Even with the efforts that Google put in educating customers and the community, Anthos could be confusing to an average Kubernetes user.
Here is an attempt to introduce Anthos to any developer or user familiar with Kubernetes.
From VM Sprawl To Cluster Sprawl
During the last decade, enterprise IT faced a challenge with virtual machine (VM) sprawl. Any user, developer, or administrator with access to the VMware environment launched a new virtual machine. Soon, the enterprise IT realized that there are hundreds of VMs running across multiple departments that are not visible to IT or managed by IT. This led to the loss of control and fragmentation of resources. Enterprise IT introduced a workflow that required approval from a departmental IT head to launch a VM. It also enforced a policy to make sure that the VMs are created from pre-approved images that are centrally managed by IT. The pre-approved images acted as templates that contained the mandatory security policies and patches ensuring that every VM is based on a hardened, tested, and trusted image.
Fast forward to 2020, enterprise IT is now experiencing Kubernetes cluster sprawl. Users at a departmental level are spinning up clusters in on-premises, private cloud, and public cloud environments. Each department runs multiple clusters provisioned through tools such as Kubespray and Kops or managed CaaS offerings such as Google Kubernetes Engine and Azure Kubernetes Service.
Enterprise IT is facing the same challenges with cluster sprawl that it once seen with VM sprawl. Kubernetes clusters have become the new deployment boundaries for applications. Though namespaces provide the required isolation and boundaries, customers find it easy to isolate applications by running them on different clusters.
Each department has multiple clusters running across different environments — on-premises, private cloud, self-provisioned clusters in the public cloud, managed clusters in the public cloud. Enumerating these clusters and managing them possess a huge challenge to the IT and DevOps teams.
To bring all the clusters to the desired state, a set of kubectl commands need to run on each cluster to ensure that they have a consistent configuration. Apply configurations, policies, quotas, RBAC roles to each cluster is laborious and error-prone.
The Meta Control Plane — A Control Plane of the Control Planes
What enterprise IT needs is a meta control plane to act as an overarching control plane of all Kubernetes clusters launched within an organization. Through the meta control plane, IT can ensure that each cluster complies with a set of predefined policies. The meta control plane can also enforce strict rules that can detect a drift in the cluster configuration and bring them back to the desired state of the configuration.
We call it the meta control plane because it manages the control plane of each Kubernetes cluster – the master nodes. A command sent to the meta control plane is automatically applied to the control plane of each cluster.
Since the meta control plane has visibility into each cluster, it can collect and aggregate relevant metrics related to the infrastructure and application health. The meta control plane becomes the single pane of glass for both configuration and observability.
Similar to how the Kubernetes controller maintains the desired state of deployments, statefulsets, jobs, and the daemonsets, the meta control plane ensures that the entire cluster maintains the desired state of the configuration.
For example, if a participating cluster is expected to run a role and a rolebinding, the meta control plane can detect when the role gets deleted and automatically reapplies the configuration. This is similar to how the Kubernetes controller maintains the desired count of replicas of a deployment.
So, the meta control plane is to Kubernetes cluster what the controller is to a deployment.
There is another commonality between the Kubernetes master and the meta control plane. When a workload is scheduled, the placement can be influenced through the combination of labels/selectors or node affinity. A nodeSelector or NodeAfffinity clause in the deployment spec will ensure that the workload lands on one of the nodes that matches the criterion. Similarly, the meta control plane can be instructed to push a deployment, configuration, or policy only to a subset of the participating clusters. This mechanism closely mimics the nodeSelector pattern of Kubernetes scheduling. Similar to labeling the nodes and using a selector at the deployment to target specific node(s), we label each participating cluster and use a selector at the meta control plane to shortlist or filter the target clusters.
As the number of clusters grows within an organization, customers need a meta control plane to take charge of the cluster management. The meta control plane will ensure that the participating clusters are normalized and consistent with their configuration.
Anthos — A Meta Control Plane from Google
What is Anthos? Simply put, it’s the meta control plane of Kubernetes from Google. Though this definition is technically correct, it doesn’t do justice to the platform. Apart from being a meta control plane, Anthos plays other roles that are critical to managing infrastructure and workloads in hybrid and multi-cloud environments.
The core component of Anthos is the meta control plane that is firmly grounded in GCP. In most cases, it is invisible to users. Similar to how Google hides the master nodes of a GKE cluster, it hides the Anthos control plane. This control plane exposes the API for managing the lifecycle of Kubernetes clusters and registering external clusters with Anthos.
Anthos — The Hybrid and Multicloud Control Plane
Though Anthos’ control plane runs in the context of GCP, it can launch managed Kubernetes clusters in a variety of environments including on-premises data center, AWS, and Azure. The managed Kubernetes clusters launched via Anthos have the same reliability and stability of a typical GKE cluster running in GCP.
For Anthos to launch a managed Kubernetes cluster on-premises, it relies on vSphere 6.5, vCenter, vSphere storage, and F5 BIG-IP or a bundled software load balancer based on an Google open source project called Seesaw. Anthos first provisions an admin cluster in vSphere which can spawn multiple user clusters. Think of the admin cluster as the local Anthos control plane that handles the lifecycle of managed clusters running in vSphere.
Anthos for Amazon Web Services, which become generally available in May 2020, can run managed Kubernetes clusters in the context of AWS. Taking advantage of Amazon EC2, Amazon EBS, AWS VPC, and Amazon ELB, Anthos can launch highly available Kubernetes clusters that span multiple availability zones. Similar to the admin cluster in vSphere, Anthos first launches a management cluster in AWS VPC which is responsible for launching additional user clusters.
When Anthos for Azure becomes available, it will leverage Azure VMs, Azure Premium Storage, Azure Virtual Networks, and Azure Load Balancer for running HA Kubernetes clusters. Technically speaking, Anthos can launch a managed cluster in any programmable infrastructure that supports running Kubernetes in high availability mode.
Apart from managing clusters launched through Anthos, the platform supports connecting external, unmanaged clusters to the control plane. The key difference between the two — managed vs unmanaged — is the lifecycle management. While Anthos can own everything from the creation and termination of managed clusters, it can partially control the external, unmanaged clusters.
Key Components of Anthos
Apart from being a hybrid, multicloud control plane for Kubernetes clusters, Anthos can manage the network policies, routing, security, configuration management of workloads deployed across clusters.
Let’s take a look at the key components of Anthos:
- Anthos Control Plane: This component is the meta control plane of Anthos. It’s responsible for managing the lifecycle of managed clusters and the registration and un-registration of external, unmanaged clusters. Anthos exposes the API for this through the Hub and Connect services.
- Anthos Service Mesh: This component is a commercially available implementation of Istio service mesh that’s optimized for Anthos. It delivers three capabilities – 1) secure communication among microservices, 2) Network and routing policies, and 3) observability.
- Anthos Config Management: This component based on GitOps enables a centralized mechanism to push deployments, configuration, and policies to all the participating clusters — both managed and unmanaged. A centrally accessible Git repository acts as a single source of truth for all the clusters. An Anthos Config Management agent running in each cluster will monitor the change of state in a cluster. When deviated from what’s defined in the Git, the agent automatically applies the configuration which will bring the cluster back to the desired state.
- Cloud Run for Anthos: Cloud Run is a serverless and clusterless environment to run containers in GCP. It’s a layer above Knative that delivers an optimal developer experience to deploy and run containers without the need to launch a GKE cluster or define a pod specification. Cloud Run for Anthos brings the same developer experience to the managed clusters.
- Ingress for Anthos: This component routes the traffic to the microservices in conjunction with the Envoy proxy configured through Anthos Service Mesh. Ingress for Anthos becomes the entry point to access workloads running in Anthos clusters. It currently works only for workloads running in GKE clusters launched by Anthos.
- Kubernetes Apps On GCP Marketplace: This acts as the catalog for a variety of stateless and stateful workloads targeting Kubernetes. Customers can push a button to deploy applications from the marketplace in Anthos managed clusters irrespective of where they are provisioned.
In the next part of this series, we will explore the Hub and Connect services of the control plane by registering a GKE cluster, an Azure AKS cluster, and an Amazon EKS cluster with Anthos. Stay tuned!
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.