Cloud Native / Kubernetes / Contributed

Manage Multicluster Kubernetes with Operators

1 Feb 2021 12:51pm, by

Sascha Haase
Sascha Haase is VP Edge at Kubermatic. His career is focused on bringing value through the right technology and the right people. Prior to Kubermatic, he worked with the Kubermatic Kubernetes Platform at different cloud service providers as well as built a company that continues to add value in its market. Sascha works and lives in Berlin.

At Kubermatic, we have been helping our customers deliver Kubernetes clusters and other cloud native solutions since before they were buzzwords. We helped customers build clusters using Ansible, Terraform, and a variety of other non cloud native tools… and we helped them rebuild the clusters when we ran into the limits of these tools.

In these early days, two things quickly became clear to us: 1) Kubernetes is not a single large cluster solution, but rather requires a larger number of smaller clusters 2) Kubernetes multicluster management needs cloud native tools built for a declarative, API driven world. Since then, these ideas have largely been validated by a variety of organizations around the world including the Cloud Native Computing Foundation, Twitter, USA Today, Zalando, and Alibaba. Knowing that every company running Kubernetes at scale would need to effectively administer multicluster management, we created the open source Kubermatic Kubernetes Platform. This blog post will cover why you need multicluster management, how Kubermatic Kubernetes Platform leverages Kubernetes Operators to automate cluster life cycle management across multiple clusters, clouds, and regions and how you can get started with it today.

Why You Need Multicluster Management

Kubernetes lacks hard multitenancy capabilities that give users, organizations, or operators the ability to allow untrusted tenants to share infrastructure resources or separate different pieces of software. This presents both a security and operational problem. When operators seek to separate workloads by type (sensitive vs nonsensitive data processing) or even just production vs non-production there is no way to do this on the cluster level; creating a security nightmare. On the operational side, trying to deploy too many applications into the same cluster can result in version conflicts, configuration conflicts, and problems with software lifecycle management. Finally, without proper isolation there is an increased risk of cascading failures.

Without hard multitenancy within a cluster, separate clusters must be used to provide adequate separation for workloads with different security requirements. Having multiple clusters to deploy applications into also allows operators to deploy similar applications together while segregating those with different life cycles from each other. Applications deployed into the same cluster can be upgraded together to reduce the operational load while applications that require different versions, configurations, and dependencies can run in separate clusters and be upgraded on their own.

If running multiple clusters is the only solution to meeting these workload and infrastructure requirements, the operational burden of this model must also be considered. Running a multitude of clusters is a massive operational challenge if done manually. For this reason, any operator considering running Kubernetes at scale should carefully evaluate their multicluster management strategy. At Kubermatic, we have chosen to do multicluster management with Kubernetes Operators.

What Is a Kubernetes Operator?

An Operator is a piece of software that understands how to run and facilitates operating another piece of software. CoreOS (since acquired by Red Hat), introduced the Kubernetes Operator in 2016, described the concept at the time:

An Operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling. An Operator has its custom controller watching the custom resources specifically defined for the applications.

This allows developers to codify life cycle management knowledge for applications that need to maintain state and thereby automates much of the ongoing management including deployments, backups, upgrades, logging, and alerting. It does this by simply watching events and leveraging the reconciliation loops built into Kubernetes. In short. a well-built operator covers the complete lifecycle of a containerized software.

How Do We Use Kubernetes Operators to Do Multicluster Management?

With Kubermatic Kubernetes Platform, we extend the Operators paradigm beyond applications to manage the clusters themselves. Yes, we are using Kubernetes to operate Kubernetes. This model has actually been proven out by multiple organizations including Alibaba, which uses it to manage tens of thousands of clusters.

On a technical level, the cluster state is defined in Custom Resource Definitions then stored within etcd. A set of controllers and their associated reconciliation loops watch for changes or additions to the cluster state and update each as required. All state is stored in a “Master Cluster.” When a new user cluster is defined, the control plane (API, etcd, Scheduler, and Controllers) is created as a Deployment of containers within a namespace of the master cluster. The worker nodes of the user cluster are deployed by machine-controller which implements Cluster API to bring declarative creation, configuration, and management to worker nodes.

Operators allow Kubermatic to automate not only the creation of clusters, but also their full life cycle management. Updating the control plane is merely doing a rolling update of a deployment of containers while updating the actual nodes in the cluster can also be done declaratively in a roll fashion.

Leveraging Kubernetes Operators also gives a consistent abstraction across all infrastructure providers. Rather than reinventing the wheel for each one, the same tooling can easily be ported from one provider to the next including hybrid and multicloud as well as integrating on-premise infrastructure (virtualized and bare metal).

What Do Operators Allow our Users to Do?

While creating an elegant solution to a difficult technical problem has been an exciting journey and learning experience, the most gratifying part has been seeing the impact it has on our users every day. As partners on their cloud native journey, we love to see the results of our software speak for themselves.

SysEleven, a managed hosting provider out of Berlin, was our first production user. The company’s engineers wanted to be able to provide Kubernetes-as-a-Service to their customers, but knew they couldn’t scale the operations through people. They chose Kubermatic Kubernetes Platform to scale through software instead and have had it in production for almost three years. Because the Kubernetes Operators behind Kubermatic Kubernetes Platform automate many of their operational tasks including the classic “turn it off and turn it back on again”, they are able to run and manage hundreds of clusters with just one FTE. This has allowed their Kubernetes team to focus on customer demands and deliver the high-quality service they have become known for. You can read about their whole journey with us here.

Above and beyond the cloud native journey, operators also allow us to adopt our proven principles and processes to adapt on to the edge. In the near future, we will provide edge capabilities that endow the principles we just covered.

How to Get Started

We recently open sourced Kubermatic Kubernetes Platform to help as many companies as possible accelerate their cloud native journey. You can find the code on Github, the documentation on our website, and our community on Slack. We are excited to see you automate your multicluster management with Kubernetes Operators!

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.