The way organizations are using Kubernetes has quickly evolved with many users dropping self-managed installations in favor of managed Kubernetes services delivered by public cloud providers, including Microsoft’s Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE).
Traditionally, self-managed Kubernetes had been the most dominant approach for container orchestration, but it now lags behind EKS usage. Findings in the StackRox‘s State of Container and Kubernetes Security Report, Winter 2020, indicated that usage of EKS has grown 37% in the last six months while usage of AKS also increased by 31%. GKE experienced even greater growth, with usage increasing a whopping 75% increase over those six months.
All the major cloud service providers offer managed Kubernetes services, so it’s important for users to understand the variations among them. This guide can help map out which service will best accommodate your needs.
The versions of Kubernetes currently offered by the three cloud providers can vary to a large degree. All three currently use a default version of Kubernetes that no longer receives official updates. However, AKS has increasingly taken the lead for adding preview access of new Kubernetes releases and promoting newer releases to general availability.
All three providers are comparable for recently added Kubernetes features, like Windows containers and GPUs. The major difference has to do with the administrative services each provides. GKE generally takes the lead by offering automated upgrades for masters and nodes while also enabling the automated detection and fixing of unhealthy nodes.
Both AKS and EKS require some degree of manual work (or customer-programmed automation) for updates. Neither provider offers specialized node health monitoring or repair. While AWS customers can create custom health checks for some level of health monitoring and customer-automated replacement for EKS clusters, AKS does not offer any comparable features.
Amazon Elastic Kubernetes Service offers something the other Kubernetes providers don’t — a financially backed service-level agreement
Differences in service level agreements for Kubernetes master control planes present another area to compare. EKS is the only provider to charge for its masters ($0.10/cluster/hour), although Google Cloud has announced that it will start charging for the control plane in June, at the same rate as EKS.
While these costs will likely be negligible for all but the smallest clusters, EKS offers something the other providers don’t — a financially backed service-level agreement (SLA). Refunds for SLA penalties rarely compare to the loss of potential productivity or revenue suffered from an outage, but offering published penalties can give users a higher level of confidence in a provider’s commitment to reliability and uptime.
Another interesting difference is the high availability of AKS master components. Azure documents do not specify if AKS uses cluster control planes with built-in redundancy. Customers with SLAs of their own for applications hosted on AKS generally need confirmation that the services and cloud infrastructure they rely on have been engineered for a similar level of reliability. While pods and nodes running in a Kubernetes cluster can survive outages of the control plane and its components, even short-lived interruptions can be problematic for some workloads.
It’s important to note that service limits are handled differently across providers — limits are per account using EKS, per subscription using AKS, or per project using GKE. EKS offers 100 clusters per region, per account. AKS, similarly, offers 100 clusters per subscriptions, and GKE offers 50 clusters per zone plus 50 regional clusters.
While most service limits are fairly straightforward, some are not. In AKS, for example, the maximum number of nodes that a cluster can have depends on a couple of variables — whether the node is in a VM State Set or Availability Set and whether cluster networking uses kubenet or the Azure CNI. Even then, it is still not clear which number takes precedence for certain configurations. On the other hand, in EKS you can plan for the maximum number of pods that can be scheduled on a Linux node, but this approach requires some research and math because it varies by the node’s EC2 instance type. GKE allows a flat 110 pods per node.
Networking and Security
Networking and security features and configurations of Kubernetes clusters often intertwine closely, which is why we are discussing these concepts together. Overall, EKS makes Kubernetes security controls standard in every cluster which simplifies the security process. Conversely, AKS requires RBAC and network policies to be enabled at cluster creation time, which makes managing security more complex. Now all three providers deploy with Kubernetes RBAC enabled by default — EKS even goes as far as making RBAC mandatory.
On the other hand, none of the providers currently enables Network Policy support by default. EKS requires the customer to install and manage upgrades for the Calico CNI themselves. AKS provides two options for Network Policy support, depending on the cluster network type, but allows enabling support only at cluster creation time.
All three cloud providers offer a few options for limiting network access to the Kubernetes API endpoint of a cluster. Even with Kubernetes RBAC and a secure authentication method enabled for a cluster, leaving the API server open to the world still leaves it unprotected from undiscovered or future vulnerabilities, seen most recently in the now-patched Billion Laughs vulnerability, which created the potential for denial-of-service attacks by unauthenticated users.
Container Image Services
Similar container image registry services are offered by all three providers. Support for image signing in Azure Container Registry (ACR) allows users to establish and confirm the authenticity and integrity of their images. Similarly, support for immutable image tags in Elastic Container Registry (ECR) helps its users trust that using the same image:tag will result in deployment of the same build of a container image every time.
All three registry services allow some degree of access control, but the degree of control varies among them. ECR and ACR both support scoping access controls to the repository level, which GCR does not support. In addition, because access control to a GCR registry depends directly on the permissions for the Google Cloud Storage bucket backing that registry, limiting access to the storage bucket by using a service perimeter can break access to GCS buckets in other GCP projects, among other side effects.
It’s critical to understand that Kubernetes services change quickly, and even with an astute understanding of the core differences in the services offered by public cloud providers, you should always run tests to compare the various compute, storage, and network options of each provider before you invest in a managed Kubernetes service. It is also important to compare the associated costs. Pricing of resources, even for a single provider, can vary between regions and are unique to each configuration. Testing each service’s features and capabilities in your own application stack will ultimately provide the most accurate pricing and performance data for your needs.
For more information and to see detailed comparison charts, see this technical post.