Let’s just say it: Kubernetes was not built with multicluster in mind. The concept of a cluster appears nowhere in the core Kubernetes API; the cluster boundary is the edge of the known universe as far as the control plane or any of the API objects are concerned. Kubernetes introduced the ability to schedule your workloads across the boundaries of VMs or nodes within a single cluster, and this luxury was a radical improvement to workload and configuration management at scale. Now, with evolving jurisdictional issues for data and compute, rising organizational complexity, and the tendency toward multitenant applications (among other reasons), global-scale enterprises are looking beyond the single cluster. Read on to learn how to think about this functionality.
While you can scale your cluster to thousands of nodes, users are finding themselves in situations where they want multiple clusters instead of one big mega-cluster. They might want to:
- Provide global availability over a series of regional node topologies.
- Stretch workloads to remote locations due to data gravity or latency concerns.
- Create hard isolation between workloads for performance or security and privacy reasons.
- Curb the blast radius of any one application or infrastructure problem within a cluster, where it is naturally self-contained and independently observable.
In many cases, it happens organically. New clusters are provisioned for a new team or project and eventually need to be tied together.
In the multicluster world, we find there are two groups of users with very different perspectives on the situation: the platform teams, who are trying to keep the lights on for what might be hundreds of clusters; and the application teams, who just want to get their app running somewhere with access to all the other services it needs. For those application teams, the hard boundary between clusters can get in the way. Ideally, an application developer would not need to comb through — or even be aware of — the platform teams’ Terraform plans to divine some special multi-cluster topology that lets them get their work done.
Understanding Application Connectivity
We can split application connectivity into three layers of abstraction:
Top: Pure Application Management
Connecting one application to another, wherever it may live — what application developers need.
Middle: the Service Layer
Where you group and define the backends that make up an application, how they are exposed, and who is allowed to talk to them.
Bottom: the Networking Layer
How each individual compute node is actually connected together.
Your vanilla Kubernetes installation handles the networking layer for you and provides the Service object primitive to configure the service layer within a single cluster. When it comes to making the leap across clusters, the path forward is no longer so clear-cut.
In today’s multicluster deployments, platform admins are solving cross-cluster service discovery in a variety of ways — some with service meshes like Istio and its managed siblings such as Google Cloud Platform’s Anthos Service Mesh, OpenShift’s Service Mesh or VMWare’s Tanzu, others with homegrown solutions.
While service meshes are incredibly powerful, it’s not always immediately clear that you need everything they provide — and depending on an organizations’ engineering bandwidth, it can be a challenge to get started. If a platform admin wants to split their stateless workloads onto another cluster, scaling and upgrading them differently than their stateful workloads, all that’s needed is some minimal service discovery between the two. If an application developer wants to centralize a few common services to reduce development redundancy but otherwise keep teams on their own clusters, sharing just a few Service objects between the clusters would be fine.
Good news! A Kubernetes-native extension to the Service API is available, giving you a way to stitch your multiple clusters together in a familiar way.
Introducing: the Multicluster Services API
The Multi-Cluster Services API, defined in KEP-1645, describes the minimal properties of a ClusterSet — that is, a set of two or more Kubernetes clusters connected together — and has at its heart two new API objects: the “ServiceExport” and the “ServiceImport”. A user creates a “ServiceExport” mapping to the “Service” that they want to share, which allows them to opt in to publishing only the services they want. A controller watching for “ServiceExport” objects creates corresponding “ServiceImport” objects in the rest of the ClusterSet, transporting the relevant information for consumer workloads.
The design is purposefully very simple, introducing only the minimal set of new API resources needed to bridge the gap between existing core Kubernetes networking primitives: the “Service” itself in the producing cluster and “EndpointSlices” in the consuming cluster. The controller is responsible for handling any conflicts between Service definitions that may be simultaneously exported from more than one cluster, and creating and maintaining the “EndpointSlices” holding the IPs of backends from every exporting cluster. The MCS API includes a DNS specification that extends the Kubernetes DNS paradigms everyone is already familiar with, adding records predictably named after the service and namespace, but ending in the DNS zone “.clusterset.local”.
This means application developers can stay at the top level and continue to build applications the way they always have. Just throw in a few ServiceExports and switch out your application config to reference DNS ending in “.clusterset.local” instead of “.cluster.local”, and you’re done. The Kubernetes Service magic spreads across clusters and you’re seamlessly using a multicluster service! For cases where you need to target a specific sticky or stateful endpoint from another cluster, you can add ServiceExports to your headless services too, making individual backends addressable from across the ClusterSet.
This new API is the culmination of over two years worth of work in SIG-Multicluster, featuring thought leadership from Google, Red Hat and others. It is now available with managed products on GKE and OpenShift, or to self-host using an open source implementation called Submariner.io. It’s also serving as a unified API that can extend into the rest of the Kubernetes ecosystem: the Gateway API supports ServiceImport backends for ingress routes, and Istio is following a multi-phased integration plan that culminates in implementing a fully-featured MCS controller in istiod, allowing drop-in adoption with minimal effort. Expanding to multicluster is now incredibly easy: Start with something as familiar and lightweight as the MCS API and pave the way for additional configuration and features when your needs increase. (To learn more, watch these three short videos on the Kubernetes MCS API: Introduction, Concepts, DNS and Headless, and subscribe to the Google Open Source YouTube channel to get a notification when we publish the next two episodes on this topic.)
Getting Started with Multicluster
Multicluster isn’t a thing of the future — it’s happening now. With the MCS API bridging the gap for service discovery between clusters, application teams can keep working at the application level without having to worry about which of the many underlying clusters their services live on. Meanwhile, platform teams have flexibility to configure their infrastructure the way they want, adding more clusters as needed to address use cases like high availability, service resiliency and centralized shared services. You don’t need to treat the edge of the cluster like a hard boundary anymore.
In fact, even the cluster itself is finally getting onboard: just ahead of Kubecon EU, SIG-Multicluster launched the alpha version of the About API, a flexible CRD for storing arbitrary cluster metadata in an easy-to-use, cluster-local way. Well-known properties like a cluster’s ID or clusterset membership are explicitly defined in KEP-2149, but the CRD has room for any implementation to add other cluster-scoped properties that need a centralized place in the Kubernetes API. Clusters have self-awareness now for the first time.
Want to give it a try yourself? It’s easy! If you want to start making your own ServiceExports right away, you can spin up a few clusters and turn on GKE’s managed implementation of the Multi-Cluster Services API. If you want to host it yourself, try deploying Submariner.io using their quickstart guides for kind, k3s, GKE, Rancher or OpenShift.
Photo by Scott Webb from Pexels.